Dataset loading utilities

class dlordinal.datasets.Adience(root: str | Path, train: bool = True, ranges: list = [(0, 2), (4, 6), (8, 13), (15, 20), (25, 32), (38, 43), (48, 53), (60, 100)], test_size: float = 0.2, transform: Callable | None = None, target_transform: Callable | None = None, verbose: bool = False)[source]

Base class for the Adience dataset.

Parameters:
  • root (Union[str, Path]) – Root directory where the datasets are stored. The Adience dataset is expected to be located under the adience directory inside the root directory. In the adience directory, the following files are expected: 1) aligned.tar.gz: a tar.gz file containing the images; 2) folds: a directory containing the folds. Each fold is expected to be a file named fold_{f}_data.txt, where f is the fold number starting from 0. These files can be downloaded from the Adience website (https://talhassner.github.io/home/projects/Adience/Adience-data.html)

  • ranges (list, optional) – List of age ranges to use, by default [(0, 2), (4, 6), (8, 13), (15, 20), (25, 32), (38, 43), (48, 53), (60, 100)].

  • test_size (float, optional, default = 0.2) – Test size.

  • transform (Callable, optional) – A callable that takes in an PIL image and returns a transformed version.

  • target_transform (Callable, optional) – A callable that takes in the target and transforms it.

  • verbose (bool, optional, default = False) – Whether to print progress messages.

root

Root directory where the datasets are stored.

Type:

Path

train

Whether to use the training or test partition.

Type:

bool

ranges

List of age ranges to use to define the categories.

Type:

list

test_size

Percentage of the dataset to use for testing.

Type:

float

transform

A callable that takes in an PIL image and returns a transformed version.

Type:

Callable

target_transform

A callable that takes in the target and transforms it.

Type:

Callable

verbose

Whether to print progress messages.

Type:

bool

data

List of image paths.

Type:

list

targets

Contains the target of each sampel contained in the dataset.

Type:

list

classes

Unique classes in the dataset.

Type:

list

class dlordinal.datasets.FGNet(root: str | Path, download: bool = True, target_size: tuple = (128, 128), categories: list = [3, 11, 16, 24, 40], test_size: float = 0.2, validation_size: float = 0.15, train: bool = True, transform: Callable | None = None, target_transform: Callable | None = None)[source]

Base class for FGNet dataset.

root

Root directory of the dataset.

Type:

Path

target_size

Size of the images after resizing.

Type:

tuple

categories

List of categories to be used.

Type:

list

test_size

Size of the test set.

Type:

float

validation_size

Size of the validation set.

Type:

float

transform

A function/transform that takes in a PIL image and returns a transformed version.

Type:

callable, optional

target_transform

A function/transform that takes in the target and transforms it.

Type:

callable, optional

data

Dataframe containing the dataset.

Type:

pd.DataFrame

Parameters:
  • root (str or Path) – Root directory of the dataset.

  • download (bool, optional, default = True) – If True, downloads the dataset from the internet and puts it in the root directory. If the dataset is already downloaded, it is not downloaded again.

  • target_size (tuple, optional) – Size of the images after resizing. Default is (128, 128).

  • categories (list, optional) – List of categories to be used. Default is [3, 11, 16, 24, 40].

  • test_size (float, optional) – Size of the test set. Default is 0.2.

  • validation_size (float, optional) – Size of the validation set. Default is 0.15.

  • train (bool, optional) – If True, returns the training dataset, otherwise returns the test dataset. Default is True.

  • transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

property classes: List[int]

Return the unique classes in the dataset.

Returns:

List of unique classes.

Return type:

list

download() None[source]

Download the FGNet dataset and extract it.

find_category(real_age)[source]

Find the category of the real age.

Parameters:

real_age (int) – Real age of the image.

get_age_from_filename(filename)[source]

Get the age from the filename.

Parameters:

filename (str) – Filename of the image.

load_data(original_path: Path)[source]

Load the data from the original_path.

Parameters:

original_path (Path) – Path to the original dataset.

process(original_path, processed_path)[source]

Process the FGNet dataset and save it in the processed_path.

Parameters:
  • original_path (Path) – Path to the original dataset.

  • processed_path (Path) – Path to save the processed dataset.

process_images_from_df(df: DataFrame, original_path: Path, processed_path: Path)[source]

Process the images from the dataframe.

Parameters:
  • df (pd.DataFrame) – Dataframe with the images.

  • original_path (Path) – Path to the original dataset.

  • processed_path (Path) – Path to save the processed dataset.

split(original_csv_path: Path, train_csv_path: Path, test_csv_path: Path, original_images_path: Path, train_images_path: Path, test_images_path: Path)[source]

Split the FGNet dataset into train and test sets.

Parameters:
  • original_csv_path (Path) – Path to the original csv file.

  • train_csv_path (Path) – Path to save the train csv file.

  • test_csv_path (Path) – Path to save the test csv file.

  • original_images_path (Path) – Path to the original images.

  • train_images_path (Path) – Path to save the train images.

  • test_images_path (Path) – Path to save the test images.

split_dataframe(csv_path: Path, train_images_path: Path, original_images_path: Path, test_images_path: Path)[source]

Split the dataframe into train and test sets.

Parameters:
  • csv_path (Path) – Path to the csv file.

  • train_images_path (Path) – Path to save the train images.

  • original_images_path (Path) – Path to the original images.

  • test_images_path (Path) – Path to save the test images.

property targets: List[int]

Return the targets of the dataset.

Returns:

List of targets.

Return type:

list

class dlordinal.datasets.FeatureDataset(filename)[source]

Dataset torch implementation for a standard dataset that contains several features that are organised in a tabular way in a csv file. The last column is the target variable.

Example

>>> train_data = FeatureDataset("train.csv")
>>> train_data.normalize_X()
>>> train_data.normalize_y()
>>> train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
>>> for X, y in train_loader:
>>>     print(X.shape, y.shape)
>>> test_data = FeatureDataset("test.csv")
>>> test_data.normalize_X(train_data.X_mean, train_data.X_scale)
>>> test_data.normalize_y(train_data.y_mean, train_data.y_scale)
>>> test_loader = DataLoader(test_data, batch_size=32, shuffle=False)
>>> for X, y in test_loader:
>>>     print(X.shape, y.shape)
get_valid_shape_array(v: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str])[source]

Convert the input ArrayLike object to a 2D numpy array with shape (n, 1) if it is a 1D array.

Parameters:

v (ArrayLike) – Input array.

Returns:

v – 2D numpy array with shape (n, 1).

Return type:

np.ndarray

normalize_X(mean: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, scale: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None)[source]

Standardize the features of the dataset. If mean and scale are not provided, they are computed from the dataset. If they are provided, they are used to standardize the dataset.

Parameters:
  • mean (array-like, default=None) – Mean of the dataset.

  • scale (array-like, default=None) – Scale of the dataset.

Returns:

self – The dataset with standardized features.

Return type:

FeatureDataset

Example

>>> train_data = FeatureDataset("train.csv")
>>> train_data.normalize_X()
>>> test_data = FeatureDataset("test.csv")
>>> test_data.normalize_X(train_data.X_mean, train_data.X_scale)
normalize_y(mean: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] = None, scale: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] = None)[source]

Standardize the target variable of the dataset. If mean and scale are not provided, they are computed from the dataset. If they are provided, they are used to standardize the dataset.

Parameters:
  • mean (array-like, default=None) – Mean of the dataset.

  • scale (array-like, default=None) – Scale of the dataset.

Returns:

self – The dataset with standardized target variable.

Return type:

FeatureDataset

Example

>>> train_data = FeatureDataset("train.csv")
>>> train_data.normalize_y()
>>> test_data = FeatureDataset("test.csv")
>>> test_data.normalize_y(train_data.y_mean, train_data.y_scale)
class dlordinal.datasets.HCI(root: str | Path, transform: Callable | None = None, target_transform: Callable | None = None, is_valid_file: Callable[[str], bool] | None = None, train: bool = True)[source]

Historical Color Images (HCI) Decade Database dataset Palermo et al.[1].

This dataset contains colour photographs from five decades (1930s-1970s), organised for decade classification. Upon first use, the dataset is automatically downloaded, verified, preprocessed, and split into training and test subsets.

The preprocessing pipeline includes: - verifying and downloading the dataset archive if necessary; - extracting and normalising directory names according to class labels; - resizing all images to 224x224 pixels; - creating a stratified 70/30 train/test split; - generating an MD5 checksum file for future integrity checks.

Parameters:
  • root (str or Path) – Root directory where the dataset will be stored and processed.

  • transform (callable, optional) – A function/transform applied to each loaded PIL image.

  • target_transform (callable, optional) – A function/transform applied to the target label.

  • is_valid_file (callable, optional) – A function that takes a file path and returns True if the file should be included.

  • train (bool, default=True) – If True, loads the training split; otherwise, loads the test split.

URL

Download URL for the dataset archive.

Type:

str

MD5

MD5 checksum used to verify the downloaded archive.

Type:

str

CATEGORIES

Mapping from decade names to numeric class labels (as strings).

Type:

dict

Example

>>> from dlordinal.datasets.hci import HCI
>>> dataset = HCI(root="data", train=True)
>>> img, label = dataset[0]

Notes

The train/test split is stratified by decade, with 70% of the images in the training set and 30% in the test set. Preprocessing is only performed the first time the dataset is initialised.