Output Layers
- class dlordinal.output_layers.BinomialLayer(*, in_features: int, num_classes: int)[source]
Unimodal output layer for ordinal classification based on the binomial distribution. Proposed by Beckham and Pal[1].
Learns the p parameter of the binomial distribution from the input features and uses the binomial distribution to compute the probabilities of each class, ensuring that the output is unimodal and that the probabilities sum to 1. The sigmoid of the linear layer output is used to ensure that the p parameter is between 0 and 1.
- Parameters:
in_features (int) – Number of input features (output features from the previous layer).
num_classes (int) – Number of output classes. Defines the support of the binomial distribution (0 to num_classes - 1).
- p_layer
Linear layer that maps input features to a scalar logit.
- Type:
torch.nn.Linear
- num_classes
Number of classes used to define the binomial distribution.
- Type:
int
Example
>>> import torch >>> from dlordinal.output_layers import BinomialLayer >>> layer = BinomialLayer(in_features=5, num_classes=3) >>> input = torch.randn(2, 5) >>> probs = layer(input) >>> print(probs)
- forward(input: Tensor) Tensor[source]
Compute class probabilities using a binomial distribution.
- Parameters:
input (torch.Tensor, shape (batch_size, in_features)) – Input feature tensor.
- Returns:
Probability distribution over classes.
- Return type:
torch.Tensor, shape (batch_size, num_classes)
- class dlordinal.output_layers.CLM(num_classes: int, link_function: Literal['logit', 'probit', 'cloglog'], min_distance: int = 0.0, **kwargs)[source]
Implementation of the cumulative link models from Vargas et al.[2] as a torch layer. Different link functions can be used, including logit, probit and cloglog.
- Parameters:
num_classes (int) – The number of classes.
link_function (str) – The link function to use. Can be
'logit','probit'or'cloglog'.min_distance (float, default=0.0) – The minimum distance between thresholds
- num_classes
The number of classes.
- Type:
int
- link_function
The link function to use. Can be
'logit','probit'or'cloglog'.- Type:
str
- min_distance
The minimum distance between thresholds
- Type:
float
- dist_
The normal (0,1) distribution used to compute the probit link function.
- Type:
torch.distributions.Normal
- thresholds_b_
The torch parameter for the first threshold.
- Type:
torch.nn.Parameter
- thresholds_a_
The torch parameter for the alphas of the thresholds.
- Type:
torch.nn.Parameter
Example
>>> import torch >>> from dlordinal.output_layers import CLM >>> inp = torch.randn(10, 5) >>> fc = torch.nn.Linear(5, 1) >>> clm = CLM(5, "logit") >>> output = clm(fc(inp)) >>> print(output) tensor([[0.7944, 0.1187, 0.0531, 0.0211, 0.0127], [0.4017, 0.2443, 0.1862, 0.0987, 0.0690], [0.4619, 0.2381, 0.1638, 0.0814, 0.0548], [0.4636, 0.2378, 0.1632, 0.0809, 0.0545], [0.4330, 0.2419, 0.1746, 0.0893, 0.0612], [0.5006, 0.2309, 0.1495, 0.0716, 0.0473], [0.6011, 0.2027, 0.1138, 0.0504, 0.0320], [0.5995, 0.2032, 0.1144, 0.0507, 0.0322], [0.4014, 0.2443, 0.1863, 0.0988, 0.0691], [0.6922, 0.1672, 0.0838, 0.0351, 0.0217]], grad_fn=<CopySlices>)
- class dlordinal.output_layers.COPOC(phi: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <function COPOC.<lambda>>, psi: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <function COPOC.<lambda>>)[source]
Implements the Conformal Predictions for OC (COPOC) output layer(s) from Dey et al.[3], which enforce unimodality in the output probabilities in a non-parametric way.
- Parameters:
phi (Callable[[Tensor], Tensor]) – Non-negative transformation function. Default is absolute value function \(\phi(x)=|x|\).
psi (Callable[[Tensor], Tensor]) – Strictly monotonic decreasing bijective function. Default is negative absolute value function \(\psi(x)=-|x|\).
Example
>>> import torch >>> from dlordinal.output_layers import COPOC >>> inp = torch.randn(10, 5) >>> fc = torch.nn.Linear(5, 5) >>> copoc = COPOC() >>> output = torch.nn.functional.softmax(copoc(fc(inp)),dim=1) >>> print(output) tensor([[0.1898, 0.1901, 0.2568, 0.2196, 0.1436], [0.4538, 0.3191, 0.1412, 0.0529, 0.0330], [0.3371, 0.2554, 0.2151, 0.1047, 0.0876], [0.1859, 0.2073, 0.2658, 0.1889, 0.1520], [0.3306, 0.2195, 0.1982, 0.1303, 0.1214], [0.2132, 0.3768, 0.1590, 0.1278, 0.1232], [0.1531, 0.1544, 0.2094, 0.2451, 0.2381], [0.4986, 0.2240, 0.1689, 0.0590, 0.0495], [0.5838, 0.2201, 0.1289, 0.0507, 0.0166], [0.1639, 0.1969, 0.2100, 0.2347, 0.1946]], grad_fn=<SoftmaxBackward0>)
- class dlordinal.output_layers.GaussianUncertaintyLayer(in_features: int, num_classes: int)[source]
Discretized Gaussian Uncertainty (GU) layer proposed by Araújo et al.[4].
Produces a discrete unimodal distribution over num_classes classes, derived from a Gaussian with learnable mean (mu) and standard deviation (sigma).
- Parameters:
in_features (int) – Number of input features (output of the previous layer).
num_classes (int) – Number of discrete output classes.
- num_classes
Number of discrete output classes.
- Type:
int
- mu_layer
Linear layer used to predict the mean (mu) of the Gaussian.
- Type:
nn.Linear
- sigma_layer
Linear layer used to predict the standard deviation (sigma).
- Type:
nn.Linear
Notes
The standard deviation is parameterised in an unconstrained way and transformed using softplus to ensure positivity.
The resulting values are normalised to form a valid probability distribution.
Example
>>> layer = GaussianUncertaintyLayer(in_features=5, num_classes=3) >>> input = torch.randn(2, 5) >>> probs, sigma = layer(input) >>> print(probs)
- forward(x: Tensor) tuple[Tensor, Tensor][source]
Forward pass.
- Parameters:
x (torch.Tensor) – Input tensor of shape (batch_size, in_features).
- Returns:
probs (torch.Tensor) – Discrete probability distribution over classes. Shape: (batch_size, num_classes).
sigma (torch.Tensor) – Predicted standard deviation for each sample. Shape: (batch_size,).
- class dlordinal.output_layers.PoissonLayer(*, in_features: int, num_classes: int, learn_tau: bool = True)[source]
Unimodal output layer for ordinal classification based on the Poisson distribution. Proposed by Beckham and Pal[1].
Learns the λ parameter of the Poisson distribution from the input features and uses the Poisson distribution to compute the probabilities of each class, ensuring that the output is unimodal and that the probabilities sum to 1. The softplus of the linear layer output is used to ensure that the λ parameter is positive. Additionally, its value is clamped between 1e-8 and 1e4 to prevent numerical issues.
The layer includes an optional learnable temperature parameter τ that controls the sharpness of the output distribution. Higher values of τ produce softer distributions, while lower values produce sharper distributions. If learn_tau is set to False, τ is fixed at 1 (no scaling).
- Parameters:
in_features (int) – Size of the input feature vector (output features from the previous layer).
num_classes (int) – Number of discrete output classes. Defines support of the distribution as {0, …, num_classes - 1}.
learn_tau (bool, default=True) – If True, the temperature parameter τ is learned as a model parameter. Otherwise, it is stored as a fixed buffer.
- lambda_layer
Linear transformation that maps input features to a scalar rate λ.
- Type:
torch.nn.Linear
- log_tau
Log-temperature parameter used to control sharpness of the distribution.
- Type:
torch.Tensor or torch.nn.Parameter
- num_classes
Number of output classes.
- Type:
int
- learn_tau
Whether temperature is learnable.
- Type:
bool
Example
>>> import torch >>> from dlordinal.output_layers import PoissonLayer >>> layer = PoissonLayer(in_features=5, num_classes=3, learn_tau=True) >>> input = torch.randn(2, 5) >>> probs = layer(input) >>> print(probs)
- forward(input: Tensor) Tensor[source]
Compute class probabilities using a Poisson-based discrete distribution.
- Parameters:
input (torch.Tensor, shape (batch_size, in_features)) – Input feature tensor.
- Returns:
Probability distribution over discrete classes.
- Return type:
torch.Tensor, shape (batch_size, num_classes)
- class dlordinal.output_layers.ResNetOrdinalFullyConnected(input_size: int, num_classes: int)[source]
ResNetOrdinalFullyConnected implements the ordinal fully connected layer
- Parameters:
input_size (int) – Input size
num_classes (int) – Number of classes
- class dlordinal.output_layers.StickBreakingLayer(input_shape: int, num_classes: int)[source]
Base class to implement the stick breaking layer from Liu et al.[5].
- Parameters:
input_shape (int) – Input shape, which refers to the number of neurons in the last fully connected layer
num_classes (int) – Number of classes
- class dlordinal.output_layers.VGGOrdinalFullyConnected(input_size: int, num_classes: int, activation_function: Callable[[], Module])[source]
VGGOrdinalFullyConnected implements the ordinal fully connected layer
- Parameters:
input_size (int) – Input size
num_classes (int) – Number of classes
activation_function (Callable[[], nn.Module]) – Activation function