core.dataclasses

This module contains the classes defining the structure of the datasets to be used with the core.model.BulkDGDModel.

class core.dataclasses.GeneExpressionDataset(df)

Class implementing a dataset containing gene expression data for multiple samples.

This class is designed so that it can be used with the torch.utils.data.DataLoader utility, if needed.

__init__(df)

Initialize an instance of the class.

Parameters:
dfpandas.DataFrame

A data frame whose rows must represent samples, and columns must represent genes.

Therefore, each cell of the data frame represents the expression of the gene on the column in the sample on the row.

For example:

,gene_1,gene_2,gene_3,gene_4
sample_1,123,12,2342,145
sample_2,189,184,2397,1980
sample_3,978,9467,563,23
property data_exp

A 2D tensor where:

  • The first dimension has a length equal to the number of samples in the dataset.

  • The second dimension has a length equal to the number of genes whose expression is reported in the dataset.

property genes

The names of the genes included in the dataset.

property mean_exp

A 1D tensor with length equal to the number of samples in the dataset containing the mean gene expression for each sample.

property samples

The names/IDs/indexes of the samples in the dataset.