Loss functions

class dlordinal.losses.BetaCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use BetaLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.BetaLoss(base_loss: Module, num_classes: int, params_set: str | Dict[int, List] = 'standard', eta: float = 1.0)[source]

Beta-regularized loss, as proposed in Vargas et al.[1].

This loss function applies a regularization term based on the Beta distribution to penalize the distance between predicted and true class distributions. It extends the CustomTargetsLoss by incorporating a Beta distribution for soft labelling.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function. It must accept y_true as a probability distribution (e.g., soft labels or one-hot encoded labels). The base loss is applied between the predicted logits (y_pred) and the adjusted target labels (y_true).

  • num_classes (int) – Number of classes.

  • params_set (str or dict[int, list], default="standard") – The set of parameters of the beta distributions employed to generate the soft labels. It can be one of the keys in the _beta_params_sets dictionary. Alternatively, it can be a dictionary with the same structure as the items of the _beta_params_sets dictionary. The keys of the dictionary must be the number of classes and the values must be a list of lists with the parameters of the beta distributions for each class. The list for each class must have three parameters \([p,q,a]\) where \(p\) and \(q\) are the shape parameters of the beta distribution and \(a\) is the scaling parameter. Example: {3: [[1, 4, 1], [4, 4, 1], [4, 1, 1]]} for three classes with the parameters \([1,4,1]\), \([4,4,1]\) and \([4,1,1]\) for each class respectively.

  • eta (float, default=1.0) – A regularization parameter that controls the balance between the base loss and the regularization term. A value of 0 means no regularization, while a value of 1 means the Beta regularization term fully influences the target labels.

Example

>>> import torch
>>> from dlordinal.losses import BetaLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = BetaLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.BinomialCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use BinomialLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.BinomialLoss(base_loss: Module, num_classes: int, eta: float = 1.0)[source]

Binomial-regularized loss, as proposed in Liu et al.[2].

This loss function applies a regularization term based on the Binomial distribution to penalize the distance between predicted and true class distributions. It extends the CustomTargetsLoss by incorporating a Binomial distribution for soft labelling.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function. It must accept y_true as a probability distribution (e.g., soft labels or one-hot encoded labels). This function is used to compute the loss between the predicted logits (y_pred) and the adjusted target labels (y_true).

  • num_classes (int) – The number of classes (J) in the classification task.

  • eta (float, default=1.0) – A regularization parameter that controls the influence of the regularization term. A value of 0 means no regularization, while a value of 1 means the Binomial regularization term fully influences the target labels.

Example

>>> import torch
>>> from dlordinal.losses import BinomialLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = BinomialLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.CDWCELoss(num_classes, alpha=0.5, weight=None, margin=0.0)[source]

Class Distance Weighted Cross-Entropy Loss, proposed in Polat et al.[3]. This loss function takes the order of the classes into account by applying a distance weighting between the target and predicted classes. The weight applied is determined by the distance between the true and predicted classes, controlled by the alpha parameter.

This loss function is particularly useful for ordinal classification tasks where the order of the classes matters, and penalties should increase as the distance between the true and predicted classes grows.

Parameters:
  • num_classes (int) – The number of classes (J).

  • alpha (float, default=0.5) – Exponent that controls the influence of the class distance in the loss calculation. A higher alpha gives more weight to classes that are farther apart.

  • weight (torch.Tensor, optional, default=None) – A tensor of shape (J,) representing class-specific weights, used to address class imbalance. The weight for each class is applied during loss computation and can be normalised automatically. If None, no class weights are applied.

  • margin (float, default=0.0) – A margin value that encourages a minimum separation between classes.

Example

>>> import torch
>>> from dlordinal.losses import CDWCELoss
>>> loss_fn = CDWCELoss(num_classes=5, alpha=1.0)
>>> y_pred = torch.randn(3, 5)
>>> y_true = torch.tensor([0, 3, 1])
>>> loss = loss_fn(y_pred, y_true)
>>> print(loss)
forward(y_pred, y_true)[source]

Computes the Class Distance Weighted Cross-Entropy loss between predicted logits and true labels.

Parameters:
  • y_pred (torch.Tensor) – A tensor of shape (N, J) containing predicted logits, where N is the batch size and J is the number of classes. These logits are typically the raw outputs of a neural network before applying a softmax function.

  • y_true (torch.Tensor) – A tensor containing the ground-truth labels. It can be either: - A tensor of shape (N,) with integer class indices (for categorical targets). - A tensor of shape (N, J) with one-hot encoded labels (for probabilistic targets).

Returns:

A scalar tensor representing the mean loss over the batch. The result is the average of the loss values computed for each sample in the batch.

Return type:

torch.Tensor

class dlordinal.losses.CORNLoss(num_classes)[source]

Rank-consistent ordinal regression (CORN) loss from Shi et al.[4].

See the reference implementation here.

Parameters:

num_classes (int) – The number of classes (J).

Note

CORN loss expects the output of your network to be of dimension J-1 because class 0 is predicted implicitly based on the probabilities of subsequent classes.

CORN loss does not support probabilistic targets.

Example

>>> import torch
>>> from dlordinal.losses import CORNLoss
>>> NUM_CLASSES = 5
>>> loss_fn = CORNLoss(num_classes=NUM_CLASSES)
>>> y_pred = torch.randn(3, NUM_CLASSES - 1)
>>> y_true = torch.tensor([0, 3, 1])
>>> loss = loss_fn(y_pred, y_true)
>>> print(loss)
forward(y_pred, y_true)[source]

Computes the CORN loss between predicted logits and true labels.

Parameters:
  • y_pred (torch.Tensor) – A tensor of shape (N, J - 1) containing predicted logits, where N is the batch size and J is the number of classes. These logits are typically the raw outputs of a neural network before applying a softmax function.

  • y_true (torch.Tensor) – A tensor of shape (N,) with integer class indices (for categorical targets).

Returns:

A scalar tensor representing the mean loss over the batch. The result is the average of the loss values computed for each sample in the batch.

Return type:

torch.Tensor

class dlordinal.losses.CustomTargetsCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use CustomTargetsLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.CustomTargetsLoss(base_loss: Module, cls_probs: Tensor, eta: float = 1.0)[source]

Base class for implementing a soft labelling loss using class-dependent target smoothing.

This loss modifies the hard class labels by combining one-hot encoding with prior class probabilities. The result is a soft target distribution used as input to a base loss function that supports probabilistic targets (e.g., KL divergence or soft cross-entropy).

The smoothing is controlled by the eta parameter, where eta=0 corresponds to standard one-hot labels and eta=1 corresponds to using only the prior class probabilities.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function to apply between predictions and soft targets. It must accept y_true as a tensor of probabilities, not class indices. Specifically, y_true should be a vector of probabilities or a one-hot encoded vector, where each element represents the probability of the corresponding class

  • cls_probs (torch.Tensor) – A tensor of shape (J, J), where each row j corresponds to a class-conditional target distribution for class j. This is used to create the soft targets.

  • eta (float, default=1.0) – A scalar in [0, 1] controlling the degree of smoothing applied to the targets. Higher values increase the influence of the prior class distributions.

Example

>>> import torch
>>> import torch.nn as nn
>>> from dlordinal.losses import CustomTargetsLoss
>>> base_loss_fn = nn.CrossEntropyLoss()
>>> cls_probs = torch.tensor([[0.9, 0.075, 0.025], [0.1, 0.6, 0.3], [0.05, 0.15, 0.8]])
>>> custom_loss_fn = CustomTargetsLoss(base_loss=base_loss_fn, cls_probs=cls_probs, eta=0.5)
>>> y_pred = torch.randn(2, 3)
>>> y_true = torch.tensor([0, 2])
>>> loss = custom_loss_fn(y_pred, y_true)
>>> print(loss)
forward(input: Tensor, target: Tensor) Tensor[source]

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.EMDLoss(num_classes: int)[source]

Computes the squared Earth Mover’s Distance (EMD) loss, also known as the Ranked Probability Score (RPS), for ordinal classification tasks.

This implementation follows the formulation presented by Hou et al.[5]. The squared EMD loss is equivalent to the RPS described in Epstein[6]. It serves as a proper scoring rule for ordinal outcomes, encouraging probabilistic predictions that are both accurate and calibrated.

Errors farther from the true class are penalised more heavily, reflecting the ordinal structure of the target variable.

Parameters:

num_classes (int) – The number of ordinal classes (denoted as J).

Examples

>>> import torch
>>> from dlordinal.losses import EMDLoss
>>> loss_fn = EMDLoss(num_classes=5)
>>> y_pred = torch.randn(8, 5)  # Predicted logits
>>> y_true = torch.tensor([0, 1, 2, 3, 4, 3, 1, 0])  # Class indices
>>> loss = loss_fn(y_pred, y_true)
forward(y_pred, y_true)[source]

Computes the squared Earth Mover’s Distance (Ranked Probability Score) between predictions and targets.

Parameters:
  • y_pred (torch.Tensor) – The model predictions. Shape: (batch_size, num_classes).

  • y_true (torch.Tensor) – Ground truth labels. Shape: - (batch_size,) if labels are class indices. - (batch_size, num_classes) if already one-hot encoded.

Returns:

A scalar tensor representing the mean squared EMD loss over the batch.

Return type:

torch.Tensor

class dlordinal.losses.ExponentialCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use ExponentialLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.ExponentialLoss(base_loss: Module, num_classes: int, p: float = 1.0, tau: float = 1.0, eta: float = 1.0)[source]

Exponential-regularized loss, as proposed in Vargas et al.[7].

This loss function applies a regularization term based on the Exponential distribution to penalize the distance between predicted and true class distributions. It extends the CustomTargetsLoss by incorporating an Exponential distribution for soft labelling.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function. It must accept y_true as a probability distribution (e.g., soft labels or one-hot encoded labels). The base loss is computed between the predicted logits (y_pred) and the adjusted target labels (y_true).

  • num_classes (int) – The number of classes (J) in the classification task.

  • p (float, default=1.0) – The exponent parameter controlling the shape of the Exponential distribution. This parameter influences the steepness of the regularization.

  • tau (float, default=1.0) – A scaling parameter for the Exponential distribution that affects the regularization term’s influence on the target labels.

  • eta (float, default=1.0) – A regularization parameter that controls the influence of the regularization term. A value of 0 means no regularization, while a value of 1 means the Exponential regularization term fully influences the target labels.

Example

>>> import torch
>>> from dlordinal.losses import ExponentialLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = ExponentialLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.GaussianUncertaintyLossWrapper(base_loss: Callable[[Tensor, Tensor], Tensor], alpha: float = 0.5)[source]

Loss wrapper for models using a Gaussian Uncertainty (GU) output layer.

This wrapper augments a base loss function with a regularisation term on the predicted uncertainty (sigma), encouraging the model to avoid unnecessarily large variance estimates.

The total loss is defined as:

total_loss = base_loss(probs, y_true)
  • (1 - alpha) * mean(sigma^2)

where: - probs is the predicted discrete probability distribution - sigma is the predicted standard deviation - alpha controls the strength of the regularisation

Parameters:
  • base_loss (Callable[[torch.Tensor, torch.Tensor], torch.Tensor]) – Loss function applied to the predicted probabilities and targets. Typically something like nn.CrossEntropyLoss (adapted to probabilities) or another suitable criterion.

  • alpha (float, optional) – Weighting factor between the base loss and the uncertainty penalty. Higher values reduce the impact of the sigma regularisation. Default is 0.5.

base_loss

Wrapped loss function.

Type:

Callable[[torch.Tensor, torch.Tensor], torch.Tensor]

alpha

Regularisation weighting factor.

Type:

float

Notes

  • The wrapper expects the model to return a tuple (probs, sigma).

  • probs should have shape (batch_size, num_classes).

  • sigma should have shape (batch_size,).

  • The regularisation term penalises large uncertainty values.

  • This formulation follows the idea proposed in Araújo et al.[8].

Example

>>> base_loss = nn.CrossEntropyLoss()
>>> loss_wrapper = GaussianUncertaintyLossWrapper(base_loss, alpha=0.5)
>>> probs = torch.tensor([[0.7, 0.2, 0.1], [0.1, 0.8, 0.1]])
>>> sigma = torch.tensor([0.5, 0.3])
>>> y_true = torch.tensor([0, 1])
>>> loss = loss_wrapper((probs, sigma), y_true)
>>> print(loss)
forward(y_pred: tuple[Tensor, Tensor], y_true: Tensor) Tensor[source]

Compute the total loss.

Parameters:
  • y_pred (tuple[torch.Tensor, torch.Tensor]) –

    Tuple containing: - probs: predicted class probabilities,

    shape (batch_size, num_classes)

    • sigma: predicted standard deviation, shape (batch_size,)

  • y_true (torch.Tensor) – Ground-truth labels or targets. Shape depends on the chosen base loss.

Returns:

Scalar loss value combining the base loss and the uncertainty penalty.

Return type:

torch.Tensor

class dlordinal.losses.GeneralTriangularCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use GeneralTriangularLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.GeneralTriangularLoss(base_loss: Module, num_classes: int, alphas: ndarray, eta: float = 1.0)[source]

Generalized triangular loss, as proposed in Vargas et al.[9].

This loss function incorporates a triangular distribution with customizable alpha parameters for each class. It applies a regularization term based on the triangular distribution to penalize the distance between predicted and true class distributions. The GeneralTriangularLoss extends CustomTargetsLoss by using a generalized triangular distribution for soft labelling.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function. It must accept y_true as a probability distribution (e.g., soft labels or one-hot encoded labels). The base loss is computed between the predicted logits (y_pred) and the adjusted target labels (y_true).

  • num_classes (int) – The number of classes (J) in the classification task.

  • alphas (np.ndarray) – A NumPy array containing the alpha parameters for the triangular distribution. The length of this array should be equal to 2 * num_classes. The alpha parameters control the shape of the triangular distribution, influencing the weight given to each class in the regularization.

  • eta (float, default=1.0) – A regularization parameter that controls the influence of the regularization term. A value of 0 means no regularization, while a value of 1 means the triangular regularization term fully influences the target labels.

Example

>>> import torch
>>> from dlordinal.losses import GeneralTriangularLoss
>>> from torch.nn import CrossEntropyLoss
>>> import numpy as np
>>> num_classes = 5
>>> alphas = np.array([0.1, 0.15, 0.1, 0.05, 0.05, 0.1, 0.15, 0.1, 0.05, 0.05])
>>> base_loss = CrossEntropyLoss()
>>> loss = GeneralTriangularLoss(base_loss, num_classes, alphas)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.GeometricCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use GeometricLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.GeometricLoss(base_loss: Module, num_classes: int, alphas: float | list = 0.1, eta: float = 1.0)[source]

Unimodal label smoothing based on the discrete geometric distribution according to Haas and Hüllermeier[10].

Parameters:
  • base_loss (Module) – The base loss function. It must accept y_true as a probability distribution (e.g., one-hot or soft labels).

  • num_classes (int) – Number of classes.

  • alphas (float or list, default=0.1) –

    The smoothing factor(s) for geometric distribution-based unimodal smoothing.

    • Single alpha value: When a single alpha value in the range [0, 1], e.g., 0.1, is provided, all classes will be smoothed equally and symmetrically. This is done by deducting alpha from the actual class, \(1 - \alpha\), and allocating \(\alpha\) to the rest of the classes, decreasing monotonically from the actual class in the form of the geometric distribution.

    • List of alpha values: Alternatively, a list of size num_classes can be provided to specify class-wise symmetric smoothing factors. An example for five classes is: [0.2, 0.05, 0.1, 0.15, 0.1].

    • List of smoothing relations: To control the fraction of the left-over probability mass \(\alpha\) allocated to the left (\(F_l \in [0,1]\)) and right (\(F_r \in [0,1]\)) sides of the true class, with \(F_l + F_r = 1\), a list of smoothing relations of the form \((\alpha, F_l, F_r)\) can be specified. This enables asymmetric unimodal smoothing. An example for five classes is: [(0.2, 0.0, 1.0), (0.05, 0.8, 0.2), (0.1, 0.5, 0.5), (0.15, 0.6, 0.4), (0.1, 1.0, 0.0)].

  • eta (float, default=1.0) – Parameter that controls the influence of the regularisation.

Example

>>> import torch
>>> from dlordinal.losses import GeometricLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = GeometricLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.MCEAndWKLoss(num_classes: int, C: float = 0.5, wk_penalization_type: str = 'quadratic', weight: Tensor | None = None, reduction: str = 'mean', use_logits=False)[source]

The loss function integrates both MCELoss and WKLoss, concurrently minimising error distances while preventing the omission of classes from predictions.

Parameters:
  • num_classes (int) – Number of classes.

  • C (float, default=0.5) – Weighting factor for WK loss (C) and MCE loss (1-C). Must be between 0 and 1.

  • wk_penalization_type (str, default='quadratic') – The penalization type of WK loss to use (‘quadratic’ or ‘linear’). See WKLoss for more details.

  • weight (Optional[Tensor], default=None) – A manual rescaling weight given to each class. If given, must be a Tensor of size J, where J is the number of classes. Otherwise, it is treated as if having all ones.

  • reduction (str, default='mean') – Specifies the reduction to apply to the target: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the target will be divided by the number of elements in the target, 'sum': the target will be summed.

  • use_logits (bool, default=False) – If True, the input will be treated as logits. If False, it will be treated as probabilities.

Example

>>> import torch
>>> from dlordinal.losses import MCEAndWKLoss
>>> num_classes = 5
>>> loss = MCEAndWKLoss(num_classes, C=0.7, use_logits=True)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> target = loss(input, target)
forward(input: Tensor, target: Tensor)[source]
Parameters:
  • input (torch.Tensor) – Ground truth labels of shape (N,) where N is the batch size. Values are class indices in the range [0, num_classes-1].

  • target (torch.Tensor) – Predicted labels of shape (N, num_classes). If use_logits is True, these are logits. Otherwise, they are probabilities.

Returns:

loss – A scalar tensor representing the weighted sum of MCE and QWK loss. If reduction is ‘none’, returns a tensor of shape (num_classes,) containing the per-class loss for both MCE and WK losses.

Return type:

torch.Tensor

class dlordinal.losses.MCELoss(num_classes: int, weight: Tensor | None = None, reduction: str = 'mean', use_logits=False)[source]

Mean Squared Error (MSE) loss computed per class. This loss function calculates the MSE for each class independently and then reduces it based on the specified reduction method. It is useful in scenarios where each class needs to be treated independently during the loss computation.

Parameters:
  • num_classes (int) – The number of classes in the classification problem.

  • weight (Optional[Tensor], default=None) – A tensor of size J, where J is the number of classes, representing the weight for each class. If provided, each class’s MSE will be scaled by its corresponding weight. If not provided, all classes are treated with equal weight (i.e., all weights are set to 1).

  • reduction (str, default='mean') – The method to reduce the MSE values across all classes: - ‘none’: No reduction is applied. A tensor of MSE values for each class is returned. - ‘mean’: The mean of the MSE values across all classes is returned. - ‘sum’: The sum of the MSE values across all classes is returned.

  • use_logits (bool, default=False) – If True, the input tensor (predictions) is assumed to be in logits format. If False, the input tensor is treated as probabilities.

Example

>>> import torch
>>> from torch.nn import CrossEntropyLoss
>>> from dlordinal.losses import MCELoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = MCELoss(num_classes=num_classes)
>>> input = torch.randn(3, num_classes)
>>> target = torch.randint(0, num_classes, (3,))
>>> output = loss(input, target)

Notes

  • The class supports both the use of logits and probabilities in the predictions.

  • When use_logits=True, the input is passed through a softmax function before computing the MSE. If use_logits=False, the input tensor is expected to already contain probabilities.

compute_per_class_mse(input: Tensor, target: Tensor)[source]

Computes the mean squared error (MSE) for each class independently.

Parameters:
  • input (torch.Tensor) – Predicted labels (either logits or probabilities, depending on use_logits).

  • target (torch.Tensor) – Ground truth labels in one-hot encoding format.

Returns:

mses – A tensor containing the MSE values for each class.

Return type:

torch.Tensor

forward(input: Tensor, target: Tensor)[source]
Parameters:
  • input (torch.Tensor) – Predicted labels. Should be logits if use_logits is True, otherwise probabilities.

  • target (torch.Tensor) – Ground truth labels, typically in class indices.

Returns:

reduced_mse – The MSE per class reduced using the specified reduction method. If reduction=’none’, the MSE values for each class are returned. Otherwise, the MSE is reduced according to the method (mean, sum).

Return type:

torch.Tensor

class dlordinal.losses.OrdinalECOCDistanceLoss(num_classes: int, weights: Tensor | None = None)[source]

Ordinal ECOC distance loss from Barbero-Gómez et al.[11] for use with dlordinal.wrappers.OBDECOCModel. Computes the MSE loss between the output of the model (class threshold probabilities) and the ideal output vector for each class.

Parameters:
  • num_classes (int) – Number of classes.

  • weights (Optional[torch.Tensor]) – Optional weighting for each class. Should be of shape (num_classes,) if provided.

target_class

A tensor of shape (num_classes, num_classes-1) containing the ideal output vectors for each class.

Type:

torch.Tensor

weights

A tensor of shape (num_classes,) containing the class-specific weights.

Type:

Optional[torch.Tensor]

forward(input, target)[source]
Parameters:
  • input (torch.Tensor) – Predicted probabilities for each class threshold, with shape (batch_size, num_classes - 1).

  • target (torch.Tensor) – Ground truth labels of shape (batch_size,). The labels are integer class indices in the range [0, num_classes-1].

Returns:

loss – A scalar tensor representing the computed loss. If weights is None, the loss is computed as the sum of the MSE between input and the target vector for each class. If weights is provided, the loss is computed as the weighted sum of the per-sample MSE losses.

Return type:

torch.Tensor

class dlordinal.losses.PoissonCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use PoissonLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.PoissonLoss(base_loss: Module, num_classes: int, eta: float = 1.0)[source]

Poisson unimodal regularised cross-entropy loss from Liu et al.[2].

This loss combines a base loss function (typically cross-entropy) with a Poisson regularisation term to improve classification performance in certain tasks, as described in the referenced paper. The base loss is applied to the probability distribution of the target labels, and the Poisson regularisation encourages the model to produce more balanced probability distributions.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function (e.g., CrossEntropyLoss). It must accept y_true as a probability distribution (e.g., one-hot encoded or soft labels).

  • num_classes (int) – Number of classes (i.e., the size of the probability distribution over the classes).

  • eta (float, default=1.0) – Regularisation parameter that controls the influence of the Poisson term. A value of 1.0 gives equal weight to the base loss and the Poisson regularisation. Smaller values reduce the impact of the regularisation.

Example

>>> import torch
>>> from dlordinal.losses import PoissonLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = PoissonLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)  # Predicted logits for 3 samples
>>> target = torch.randint(0, num_classes, (3,))  # Ground truth class indices
>>> output = loss(input, target)  # Compute the loss
>>> print(output)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.SLACELoss(alpha: float, num_classes: int, weight: Tensor | None = None, use_logits: bool = True)[source]

Implements the SLACE (Soft Labels Accumulating Cross Entropy) loss from Nachmani et al.[12].

Ordinal regression classifies objects to classes with a natural order, where the severity of prediction errors varies (e.g., classifying ‘No Risk’ as ‘Critical Risk’ is worse than ‘High Risk’).

SLACE is ordinality-aware loss designed to ensure the model’s output is as close as possible to the correct class, considering the order of labels.

It provably satisfies two key properties for ordinal losses: monotonicity and balance sensitivity.

The mechanism involves generating a smooth, ordinally-weighted target probability distribution (‘softmax_targets’) and applying cross-entropy to an accumulated version of the model’s predicted distribution (‘accumulating_softmax’).

Parameters:
  • alpha (float) – Scaling factor controlling the ‘smoothness’ of the softmax target distribution. A higher alpha results in a sharper distribution.

  • num_classes (int) – The total number of ordinal classes (C).

  • weight (Optional[torch.Tensor], default=None) – Optional class weights of shape [num_classes] to handle class imbalance.

  • use_logits (bool, default=True) – If True, assumes ‘input’ contains logits and applies softmax internally. If False, assumes ‘input’ is already probabilities.

prox_dom

The precomputed ordinal dominance matrix used for probability accumulation. Registered as a buffer.

Type:

Optional[torch.Tensor]

forward(input: Tensor, target: Tensor) Tensor[source]

Calculates the SLACE loss between the model’s prediction and the ordinal target distribution.

Parameters:
  • input (torch.Tensor) – The model’s output (logits or probabilities) with shape [Batch, num_classes].

  • target (torch.Tensor) – The true ordinal labels with shape [Batch] or [Batch, 1].

Returns:

The scalar mean value of the SLACE loss.

Return type:

torch.Tensor

class dlordinal.losses.SORDLoss(alpha: float, num_classes: int, train_targets: Tensor, prox: bool = False, ftype: str = 'max', weight: Tensor | None = None, use_logits: bool = True)[source]

Implements the SORD (Softmax-based Ordinal Regression Distribution) Loss from Diaz and Marathe[13].

SORD Loss generates a smooth, ordinally-weighted target distribution (‘softmax_targets’) and applies standard Cross-Entropy Loss (or KL Divergence) to the model’s prediction. The target distribution is based on the distance from the true target and can be further customized using proximity measures.

This loss belongs to the family of ordinal losses designed to penalize errors based on the severity of the ordinal distance.

Parameters:
  • alpha (float) – Scaling factor controlling the ‘smoothness’ of the softmax target distribution. A higher alpha results in a sharper distribution.

  • num_classes (int) – The total number of ordinal classes (C).

  • train_targets (torch.Tensor) – The target labels from the training dataset, required to compute class counts and initialize the proximity matrix (prox_mat).

  • prox (bool, default=False) – If True, enables the use of class-frequency-based proximity matrices (prox_mat) instead of simple L1 distance.

  • ftype (str, default="max") – Defines the function used to convert the proximity matrix into the final penalty (phi). Only used if prox is True. Options include: “max”, “norm_max”, “log”, “norm_log”, “division”, “norm_division”.

  • weight (Optional[torch.Tensor], default=None) – Optional class weights of shape [num_classes] to handle class imbalance.

  • use_logits (bool, default=True) – If True, applies F.log_softmax to the input for numerical stability. If False, assumes input is probabilities and applies log(input + 1e-9).

prox_mat

The precomputed proximity matrix based on training set class frequencies. Used when prox is True.

Type:

Optional[torch.Tensor]

norm_prox_mat

The L1-normalized version of prox_mat.

Type:

Optional[torch.Tensor]

forward(input: Tensor, target: Tensor) Tensor[source]

Calculates the SORD loss between the model’s prediction and the ordinal target distribution.

Parameters:
  • input (torch.Tensor) – The model’s output (logits or probabilities) with shape [Batch, C].

  • target (torch.Tensor) – The true ordinal labels with shape [Batch].

Returns:

The scalar mean value of the SORD loss.

Return type:

torch.Tensor

class dlordinal.losses.TriangularCrossEntropyLoss(*args, **kwargs)[source]

Deprecated since version 2.4.0: Use TriangularLoss instead with CrossEntropyLoss as base_loss. Will be removed in 3.0.0.

class dlordinal.losses.TriangularLoss(base_loss: Module, num_classes: int, alpha2: float = 0.05, eta: float = 1.0)[source]

Triangular regularised loss from Vargas et al.[14].

This loss function combines a base loss function (such as cross-entropy) with a triangular regularisation term, which distributes probabilities to adjacent classes. The parameter alpha2 controls the amount of probability deposited into adjacent classes, and eta controls the strength of the regularisation.

Parameters:
  • base_loss (torch.nn.Module) – The base loss function (e.g., CrossEntropyLoss). It must accept y_true as a probability distribution (e.g., one-hot or soft labels).

  • num_classes (int) – Number of classes. This defines the size of the probability distribution.

  • alpha2 (float, default=0.05) – Parameter that controls the amount of probability deposited in adjacent classes. Higher values increase the contribution of adjacent classes.

  • eta (float, default=1.0) – Regularisation parameter that controls the influence of the triangular regularisation term. A value of 1.0 gives equal weight to the base loss and the triangular term, while smaller values reduce the regularisation strength.

Example

>>> import torch
>>> from dlordinal.losses import TriangularLoss
>>> from torch.nn import CrossEntropyLoss
>>> num_classes = 5
>>> base_loss = CrossEntropyLoss()
>>> loss = TriangularLoss(base_loss, num_classes)
>>> input = torch.randn(3, num_classes)  # Predicted logits for 3 samples
>>> target = torch.randint(0, num_classes, (3,))  # Ground truth class indices
>>> output = loss(input, target)  # Compute the loss
>>> print(output)
forward(input: Tensor, target: Tensor) Tensor

Computes the loss between the input predictions and the target labels.

Parameters:
  • input (torch.Tensor) – A float tensor of shape (N, J) containing predicted logits or probabilities, where N is the batch size and J is the number of classes. The expected format (logits vs probabilities) depends on the specific base loss function.

  • target (torch.Tensor) – An integer tensor of shape (N,) containing the class indices (0 ≤ target < J) corresponding to the correct classes for each sample.

Returns:

A scalar tensor representing the computed loss.

Return type:

torch.Tensor

class dlordinal.losses.WKLoss(num_classes: int, penalization_type: str = 'quadratic', weight: Tensor | None = None, epsilon: float | None = 1e-10, use_logits=False, use_logarithm=False)[source]

Implements Weighted Kappa Loss, introduced by de la Torre et al.[15] and modified by Vargas et al.[16]. Weighted Kappa is widely used in ordinal classification problems. In its original proposal, the loss values lie in \([-\infty, \log 2]\), whereas in the version proposed by Vargas et al.[16] the range is \([0, 2]\).

Following the definition of Vargas et al.[16], the loss is computed as follows:

\[\mathcal{L}(X, \mathbf{y}) = \frac{\sum\limits_{i=1}^J \sum\limits_{j=1}^J \omega_{i,j} \sum\limits_{k=1}^N q_{k,i} ~ p_{y_k,j}} {\frac{1}{N}\sum\limits_{i=1}^J \sum\limits_{j=1}^J \omega_{i,j} \left( \sum\limits_{k=1}^N q_{k,i} \right) \left( \sum\limits_{k=1}^N p_{y_k, j} \right)}\]

where \(q_{k,j}\) denotes the normalised predicted probability, computed as:

\[q_{k,j} = \frac{\text{P}(\text{y} = j ~|~ \mathbf{x}_k)} {\sum\limits_{i=1}^J \text{P}(\text{y} = i ~|~ \mathbf{x}_k)},\]

\(p_{y_k,j}\) is the \(j\)-th element of the one-hot encoded true label for sample \(k\), and \(\omega\) is the penalisation matrix, defined either linearly or quadratically. Its elements are:

  • Linear: \(\omega_{i,j} = \frac{|i - j|}{J - 1}\)

  • Quadratic: \(\omega_{i,j} = \frac{(i - j)^2}{(J - 1)^2}\)

When considering the original definition of Weighted Kappa, the loss can be defined as follows:

\[\mathcal{L}(X, \mathbf{y}) = \log\left( \frac{\sum\limits_{i=1}^J \sum\limits_{j=1}^J \omega_{i,j} \sum\limits_{k=1}^N q_{k,i} ~ p_{y_k,j}} {\frac{1}{N}\sum\limits_{i=1}^J \sum\limits_{j=1}^J \omega_{i,j} \left( \sum\limits_{k=1}^N q_{k,i} \right) \left( \sum\limits_{k=1}^N p_{y_k, j} \right)} \right)\]

The parameter use_logarithm can be set to True to use this version of the loss. The numerical instability caused by the logarithm is mitigated by adding a small value epsilon to the denominator.

Parameters:
  • num_classes (int) – The number of unique classes in your dataset.

  • penalization_type (str, default='quadratic') – The penalization method for calculating the Kappa statistics. Valid options are ['linear', 'quadratic']. Defaults to ‘quadratic’.

  • epsilon (float, default=1e-10) – Small value added to the denominator division by zero.

  • weight (Optional[torch.Tensor], default=None) – Class weights to apply during loss computation. Should be a tensor of size (num_classes,). If None, equal weight is given to all classes.

  • use_logits (bool, default=False) – If True, the input is treated as logits. If False, input is treated as probabilities. The behavior of the input affects its expected format (logits vs. probabilities).

  • use_logarithm (bool, default=False) – If True, the logarithm of the Weighted Kappa is computed, following the original definition by de la Torre et al.[15].

Example

>>> import torch
>>> from dlordinal.losses import WKLoss
>>> num_classes = 5
>>> input = torch.randn(3, num_classes)  # Predicted logits for 3 samples
>>> target = torch.randint(0, num_classes, (3,))  # Ground truth class indices
>>> loss_fn = WKLoss(num_classes)
>>> loss = loss_fn(input, target)
>>> print(loss)
forward(input, target)[source]

Forward pass for the Weighted Kappa loss.

This method computes the Weighted Kappa loss between the predicted and true labels. The loss is based on the weighted disagreement between predictions and true labels, normalised by the expected disagreement under independence.

Parameters:
  • input (torch.Tensor) – The model predictions. Shape: (batch_size, num_classes). If use_logits=True, these should be raw logits (unnormalised scores). If use_logits=False, these should be probabilities (rows summing to 1).

  • target (torch.Tensor) – Ground truth labels. Shape: - (batch_size,) if labels are class indices. - (batch_size, num_classes) if already one-hot encoded. The tensor will be converted to float internally.

Returns:

loss – A scalar tensor representing the weighted disagreement between predictions and true labels, normalised by the expected disagreement.

Return type:

torch.Tensor