Dataset loading utilities
- class dlordinal.datasets.Adience(root: str | Path, train: bool = True, ranges: list = [(0, 2), (4, 6), (8, 13), (15, 20), (25, 32), (38, 43), (48, 53), (60, 100)], test_size: float = 0.2, transform: Callable | None = None, target_transform: Callable | None = None, verbose: bool = False)[source]
Base class for the Adience dataset.
- Parameters:
root (Union[str, Path]) – Root directory where the datasets are stored. The Adience dataset is expected to be located under the adience directory inside the root directory. In the adience directory, the following files are expected: 1) aligned.tar.gz: a tar.gz file containing the images; 2) folds: a directory containing the folds. Each fold is expected to be a file named fold_{f}_data.txt, where f is the fold number starting from 0. These files can be downloaded from the Adience website (https://talhassner.github.io/home/projects/Adience/Adience-data.html)
ranges (list, optional) – List of age ranges to use, by default [(0, 2), (4, 6), (8, 13), (15, 20), (25, 32), (38, 43), (48, 53), (60, 100)].
test_size (float, optional, default = 0.2) – Test size.
transform (Callable, optional) – A callable that takes in an PIL image and returns a transformed version.
target_transform (Callable, optional) – A callable that takes in the target and transforms it.
verbose (bool, optional, default = False) – Whether to print progress messages.
- root
Root directory where the datasets are stored.
- Type:
Path
- train
Whether to use the training or test partition.
- Type:
bool
- ranges
List of age ranges to use to define the categories.
- Type:
list
- test_size
Percentage of the dataset to use for testing.
- Type:
float
- transform
A callable that takes in an PIL image and returns a transformed version.
- Type:
Callable
- target_transform
A callable that takes in the target and transforms it.
- Type:
Callable
- verbose
Whether to print progress messages.
- Type:
bool
- data
List of image paths.
- Type:
list
- targets
Contains the target of each sampel contained in the dataset.
- Type:
list
- classes
Unique classes in the dataset.
- Type:
list
- class dlordinal.datasets.FGNet(root: str | Path, download: bool = True, target_size: tuple = (128, 128), categories: list = [3, 11, 16, 24, 40], test_size: float = 0.2, validation_size: float = 0.15, train: bool = True, transform: Callable | None = None, target_transform: Callable | None = None)[source]
Base class for FGNet dataset.
- root
Root directory of the dataset.
- Type:
Path
- target_size
Size of the images after resizing.
- Type:
tuple
- categories
List of categories to be used.
- Type:
list
- test_size
Size of the test set.
- Type:
float
- validation_size
Size of the validation set.
- Type:
float
- transform
A function/transform that takes in a PIL image and returns a transformed version.
- Type:
callable, optional
- target_transform
A function/transform that takes in the target and transforms it.
- Type:
callable, optional
- data
Dataframe containing the dataset.
- Type:
pd.DataFrame
- Parameters:
root (str or Path) – Root directory of the dataset.
download (bool, optional, default = True) – If True, downloads the dataset from the internet and puts it in the root directory. If the dataset is already downloaded, it is not downloaded again.
target_size (tuple, optional) – Size of the images after resizing. Default is (128, 128).
categories (list, optional) – List of categories to be used. Default is [3, 11, 16, 24, 40].
test_size (float, optional) – Size of the test set. Default is 0.2.
validation_size (float, optional) – Size of the validation set. Default is 0.15.
train (bool, optional) – If True, returns the training dataset, otherwise returns the test dataset. Default is True.
transform (callable, optional) – A function/transform that takes in a PIL image and returns a transformed version.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
- property classes: List[int]
Return the unique classes in the dataset.
- Returns:
List of unique classes.
- Return type:
list
- find_category(real_age)[source]
Find the category of the real age.
- Parameters:
real_age (int) – Real age of the image.
- get_age_from_filename(filename)[source]
Get the age from the filename.
- Parameters:
filename (str) – Filename of the image.
- load_data(original_path: Path)[source]
Load the data from the original_path.
- Parameters:
original_path (Path) – Path to the original dataset.
- process(original_path, processed_path)[source]
Process the FGNet dataset and save it in the processed_path.
- Parameters:
original_path (Path) – Path to the original dataset.
processed_path (Path) – Path to save the processed dataset.
- process_images_from_df(df: DataFrame, original_path: Path, processed_path: Path)[source]
Process the images from the dataframe.
- Parameters:
df (pd.DataFrame) – Dataframe with the images.
original_path (Path) – Path to the original dataset.
processed_path (Path) – Path to save the processed dataset.
- split(original_csv_path: Path, train_csv_path: Path, test_csv_path: Path, original_images_path: Path, train_images_path: Path, test_images_path: Path)[source]
Split the FGNet dataset into train and test sets.
- Parameters:
original_csv_path (Path) – Path to the original csv file.
train_csv_path (Path) – Path to save the train csv file.
test_csv_path (Path) – Path to save the test csv file.
original_images_path (Path) – Path to the original images.
train_images_path (Path) – Path to save the train images.
test_images_path (Path) – Path to save the test images.
- split_dataframe(csv_path: Path, train_images_path: Path, original_images_path: Path, test_images_path: Path)[source]
Split the dataframe into train and test sets.
- Parameters:
csv_path (Path) – Path to the csv file.
train_images_path (Path) – Path to save the train images.
original_images_path (Path) – Path to the original images.
test_images_path (Path) – Path to save the test images.
- property targets: List[int]
Return the targets of the dataset.
- Returns:
List of targets.
- Return type:
list
- class dlordinal.datasets.FeatureDataset(filename)[source]
Dataset torch implementation for a standard dataset that contains several features that are organised in a tabular way in a csv file. The last column is the target variable.
Example
>>> train_data = FeatureDataset("train.csv") >>> train_data.normalize_X() >>> train_data.normalize_y() >>> train_loader = DataLoader(train_data, batch_size=32, shuffle=True) >>> for X, y in train_loader: >>> print(X.shape, y.shape) >>> test_data = FeatureDataset("test.csv") >>> test_data.normalize_X(train_data.X_mean, train_data.X_scale) >>> test_data.normalize_y(train_data.y_mean, train_data.y_scale) >>> test_loader = DataLoader(test_data, batch_size=32, shuffle=False) >>> for X, y in test_loader: >>> print(X.shape, y.shape)
- get_valid_shape_array(v: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str])[source]
Convert the input ArrayLike object to a 2D numpy array with shape (n, 1) if it is a 1D array.
- Parameters:
v (ArrayLike) – Input array.
- Returns:
v – 2D numpy array with shape (n, 1).
- Return type:
np.ndarray
- normalize_X(mean: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, scale: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None)[source]
Standardize the features of the dataset. If mean and scale are not provided, they are computed from the dataset. If they are provided, they are used to standardize the dataset.
- Parameters:
mean (array-like, default=None) – Mean of the dataset.
scale (array-like, default=None) – Scale of the dataset.
- Returns:
self – The dataset with standardized features.
- Return type:
Example
>>> train_data = FeatureDataset("train.csv") >>> train_data.normalize_X() >>> test_data = FeatureDataset("test.csv") >>> test_data.normalize_X(train_data.X_mean, train_data.X_scale)
- normalize_y(mean: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] = None, scale: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] = None)[source]
Standardize the target variable of the dataset. If mean and scale are not provided, they are computed from the dataset. If they are provided, they are used to standardize the dataset.
- Parameters:
mean (array-like, default=None) – Mean of the dataset.
scale (array-like, default=None) – Scale of the dataset.
- Returns:
self – The dataset with standardized target variable.
- Return type:
Example
>>> train_data = FeatureDataset("train.csv") >>> train_data.normalize_y() >>> test_data = FeatureDataset("test.csv") >>> test_data.normalize_y(train_data.y_mean, train_data.y_scale)
- class dlordinal.datasets.HCI(root: str | Path, transform: Callable | None = None, target_transform: Callable | None = None, is_valid_file: Callable[[str], bool] | None = None, train: bool = True)[source]
Historical Color Images (HCI) Decade Database dataset Palermo et al.[1].
This dataset contains colour photographs from five decades (1930s-1970s), organised for decade classification. Upon first use, the dataset is automatically downloaded, verified, preprocessed, and split into training and test subsets.
The preprocessing pipeline includes: - verifying and downloading the dataset archive if necessary; - extracting and normalising directory names according to class labels; - resizing all images to 224x224 pixels; - creating a stratified 70/30 train/test split; - generating an MD5 checksum file for future integrity checks.
- Parameters:
root (str or Path) – Root directory where the dataset will be stored and processed.
transform (callable, optional) – A function/transform applied to each loaded PIL image.
target_transform (callable, optional) – A function/transform applied to the target label.
is_valid_file (callable, optional) – A function that takes a file path and returns
Trueif the file should be included.train (bool, default=True) – If
True, loads the training split; otherwise, loads the test split.
- URL
Download URL for the dataset archive.
- Type:
str
- MD5
MD5 checksum used to verify the downloaded archive.
- Type:
str
- CATEGORIES
Mapping from decade names to numeric class labels (as strings).
- Type:
dict
Example
>>> from dlordinal.datasets.hci import HCI >>> dataset = HCI(root="data", train=True) >>> img, label = dataset[0]
Notes
The train/test split is stratified by decade, with 70% of the images in the training set and 30% in the test set. Preprocessing is only performed the first time the dataset is initialised.