Output Layers

class dlordinal.output_layers.BinomialLayer(*, in_features: int, num_classes: int)[source]

Unimodal output layer for ordinal classification based on the binomial distribution. Proposed by Beckham and Pal[1].

Learns the p parameter of the binomial distribution from the input features and uses the binomial distribution to compute the probabilities of each class, ensuring that the output is unimodal and that the probabilities sum to 1. The sigmoid of the linear layer output is used to ensure that the p parameter is between 0 and 1.

Parameters:

in_features (int) – Number of input features (output features from the previous layer).
num_classes (int) – Number of output classes. Defines the support of the binomial distribution (0 to num_classes - 1).

p_layer

Linear layer that maps input features to a scalar logit.

Type:: torch.nn.Linear

num_classes

Number of classes used to define the binomial distribution.

Type:: int

Example

>>> import torch
>>> from dlordinal.output_layers import BinomialLayer
>>> layer = BinomialLayer(in_features=5, num_classes=3)
>>> input = torch.randn(2, 5)
>>> probs = layer(input)
>>> print(probs)

forward(input: Tensor) → Tensor[source]

Compute class probabilities using a binomial distribution.

Parameters:: input (torch.Tensor, shape (batch_size, in_features)) – Input feature tensor.
Returns:: Probability distribution over classes.
Return type:: torch.Tensor, shape (batch_size, num_classes)

class dlordinal.output_layers.CLM(num_classes: int, link_function: Literal['logit', 'probit', 'cloglog'], min_distance: int = 0.0, **kwargs)[source]

Implementation of the cumulative link models from Vargas et al.[2] as a torch layer. Different link functions can be used, including logit, probit and cloglog.

Parameters:

num_classes (int) – The number of classes.
link_function (str) – The link function to use. Can be 'logit', 'probit' or 'cloglog'.
min_distance (float, default=0.0) – The minimum distance between thresholds

num_classes

The number of classes.

Type:: int

link_function

The link function to use. Can be 'logit', 'probit' or 'cloglog'.

Type:: str

min_distance

The minimum distance between thresholds

Type:: float

dist_

The normal (0,1) distribution used to compute the probit link function.

Type:: torch.distributions.Normal

thresholds_b_

The torch parameter for the first threshold.

Type:: torch.nn.Parameter

thresholds_a_

The torch parameter for the alphas of the thresholds.

Type:: torch.nn.Parameter

Example

>>> import torch
>>> from dlordinal.output_layers import CLM
>>> inp = torch.randn(10, 5)
>>> fc = torch.nn.Linear(5, 1)
>>> clm = CLM(5, "logit")
>>> output = clm(fc(inp))
>>> print(output)
tensor([[0.7944, 0.1187, 0.0531, 0.0211, 0.0127],
        [0.4017, 0.2443, 0.1862, 0.0987, 0.0690],
        [0.4619, 0.2381, 0.1638, 0.0814, 0.0548],
        [0.4636, 0.2378, 0.1632, 0.0809, 0.0545],
        [0.4330, 0.2419, 0.1746, 0.0893, 0.0612],
        [0.5006, 0.2309, 0.1495, 0.0716, 0.0473],
        [0.6011, 0.2027, 0.1138, 0.0504, 0.0320],
        [0.5995, 0.2032, 0.1144, 0.0507, 0.0322],
        [0.4014, 0.2443, 0.1863, 0.0988, 0.0691],
        [0.6922, 0.1672, 0.0838, 0.0351, 0.0217]], grad_fn=<CopySlices>)

forward(x)[source]

Parameters:: x (torch.Tensor) – The input tensor.
Returns:: output – The output tensor.
Return type:: Tensor

class dlordinal.output_layers.COPOC(phi: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <function COPOC.<lambda>>, psi: ~typing.Callable[[~torch.Tensor], ~torch.Tensor] = <function COPOC.<lambda>>)[source]

Implements the Conformal Predictions for OC (COPOC) output layer(s) from Dey et al.[3], which enforce unimodality in the output probabilities in a non-parametric way.

Parameters:

phi (Callable[[Tensor], Tensor]) – Non-negative transformation function. Default is absolute value function \(\phi(x)=|x|\).
psi (Callable[[Tensor], Tensor]) – Strictly monotonic decreasing bijective function. Default is negative absolute value function \(\psi(x)=-|x|\).

Example

>>> import torch
>>> from dlordinal.output_layers import COPOC
>>> inp = torch.randn(10, 5)
>>> fc = torch.nn.Linear(5, 5)
>>> copoc = COPOC()
>>> output = torch.nn.functional.softmax(copoc(fc(inp)),dim=1)
>>> print(output)
tensor([[0.1898, 0.1901, 0.2568, 0.2196, 0.1436],
        [0.4538, 0.3191, 0.1412, 0.0529, 0.0330],
        [0.3371, 0.2554, 0.2151, 0.1047, 0.0876],
        [0.1859, 0.2073, 0.2658, 0.1889, 0.1520],
        [0.3306, 0.2195, 0.1982, 0.1303, 0.1214],
        [0.2132, 0.3768, 0.1590, 0.1278, 0.1232],
        [0.1531, 0.1544, 0.2094, 0.2451, 0.2381],
        [0.4986, 0.2240, 0.1689, 0.0590, 0.0495],
        [0.5838, 0.2201, 0.1289, 0.0507, 0.0166],
        [0.1639, 0.1969, 0.2100, 0.2347, 0.1946]], grad_fn=<SoftmaxBackward0>)

forward(x: Tensor) → Tensor[source]

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, num_classes).
Returns:: probs – Logits of the unimodal output layer (batch_size, num_classes).
Return type:: torch.Tensor

class dlordinal.output_layers.GaussianUncertaintyLayer(in_features: int, num_classes: int)[source]

Discretized Gaussian Uncertainty (GU) layer proposed by Araújo et al.[4].

Produces a discrete unimodal distribution over num_classes classes, derived from a Gaussian with learnable mean (mu) and standard deviation (sigma).

Parameters:

in_features (int) – Number of input features (output of the previous layer).
num_classes (int) – Number of discrete output classes.

num_classes

Number of discrete output classes.

Type:: int

mu_layer

Linear layer used to predict the mean (mu) of the Gaussian.

Type:: nn.Linear

sigma_layer

Linear layer used to predict the standard deviation (sigma).

Type:: nn.Linear

Notes

The standard deviation is parameterised in an unconstrained way and transformed using softplus to ensure positivity.
The resulting values are normalised to form a valid probability distribution.

Example

>>> layer = GaussianUncertaintyLayer(in_features=5, num_classes=3)
>>> input = torch.randn(2, 5)
>>> probs, sigma = layer(input)
>>> print(probs)

forward(x: Tensor) → tuple[Tensor, Tensor][source]

Forward pass.

Parameters:

x (torch.Tensor) – Input tensor of shape (batch_size, in_features).

Returns:

probs (torch.Tensor) – Discrete probability distribution over classes. Shape: (batch_size, num_classes).
sigma (torch.Tensor) – Predicted standard deviation for each sample. Shape: (batch_size,).

class dlordinal.output_layers.PoissonLayer(*, in_features: int, num_classes: int, learn_tau: bool = True)[source]

Unimodal output layer for ordinal classification based on the Poisson distribution. Proposed by Beckham and Pal[1].

Learns the λ parameter of the Poisson distribution from the input features and uses the Poisson distribution to compute the probabilities of each class, ensuring that the output is unimodal and that the probabilities sum to 1. The softplus of the linear layer output is used to ensure that the λ parameter is positive. Additionally, its value is clamped between 1e-8 and 1e4 to prevent numerical issues.

The layer includes an optional learnable temperature parameter τ that controls the sharpness of the output distribution. Higher values of τ produce softer distributions, while lower values produce sharper distributions. If learn_tau is set to False, τ is fixed at 1 (no scaling).

Parameters:

in_features (int) – Size of the input feature vector (output features from the previous layer).
num_classes (int) – Number of discrete output classes. Defines support of the distribution as {0, …, num_classes - 1}.
learn_tau (bool, default=True) – If True, the temperature parameter τ is learned as a model parameter. Otherwise, it is stored as a fixed buffer.

lambda_layer

Linear transformation that maps input features to a scalar rate λ.

Type:: torch.nn.Linear

log_tau

Log-temperature parameter used to control sharpness of the distribution.

Type:: torch.Tensor or torch.nn.Parameter

num_classes

Number of output classes.

Type:: int

learn_tau

Whether temperature is learnable.

Type:: bool

Example

>>> import torch
>>> from dlordinal.output_layers import PoissonLayer
>>> layer = PoissonLayer(in_features=5, num_classes=3, learn_tau=True)
>>> input = torch.randn(2, 5)
>>> probs = layer(input)
>>> print(probs)

forward(input: Tensor) → Tensor[source]

Compute class probabilities using a Poisson-based discrete distribution.

Parameters:: input (torch.Tensor, shape (batch_size, in_features)) – Input feature tensor.
Returns:: Probability distribution over discrete classes.
Return type:: torch.Tensor, shape (batch_size, num_classes)

class dlordinal.output_layers.ResNetOrdinalFullyConnected(input_size: int, num_classes: int)[source]

ResNetOrdinalFullyConnected implements the ordinal fully connected layer

Parameters:

input_size (int) – Input size
num_classes (int) – Number of classes

forward(x: Tensor) → Tensor[source]

Parameters:: x (torch.Tensor) – Input tensor

class dlordinal.output_layers.StickBreakingLayer(input_shape: int, num_classes: int)[source]

Base class to implement the stick breaking layer from Liu et al.[5].

Parameters:

input_shape (int) – Input shape, which refers to the number of neurons in the last fully connected layer
num_classes (int) – Number of classes

forward(x) → Tensor[source]

Parameters:: x (torch.Tensor) – Input tensor
Returns:: logits – Logits of the stick breaking layer
Return type:: torch.Tensor

get_stick_logits(x: Tensor)[source]

Parameters:: x (torch.Tensor) – Input tensor
Returns:: logits – Logits of the stick breaking layer
Return type:: torch.Tensor

class dlordinal.output_layers.VGGOrdinalFullyConnected(input_size: int, num_classes: int, activation_function: Callable[[], Module])[source]

VGGOrdinalFullyConnected implements the ordinal fully connected layer

Parameters:

input_size (int) – Input size
num_classes (int) – Number of classes
activation_function (Callable[[], nn.Module]) – Activation function

forward(x: Tensor) → Tensor[source]

Parameters:: x (torch.Tensor) – Input tensor