ioutil - utilities for I/O operations

ioutil.load_config_model(config_file)

Load the configuration specifying the DGD model’s parameters and, possibly, the path to the files containing the trained model from a YAML file.

Parameters:
config_filestr

The YAML configuration file.

Returns:
configdict

A dictionary containing the configuration.

ioutil.load_config_rep(config_file)

Load the configuration containing the options for the optimization round(s) to find the best representations for a set of samples from a YAML file.

Parameters:
config_filestr

The YAML configuration file.

Returns:
configdict

A dictionary containing the configuration.

ioutil.load_config_train(config_file)

Load the configuration containing the options for training the model from a YAML file.

Parameters:
config_filestr

The YAML configuration file.

Returns:
configdict

A dictionary containing the configuration.

ioutil.load_config_plot(config_file)

Load a configuration for a plot from a YAML file.

Parameters:
config_filestr

A YAML configuration file.

Returns:
configdict

A dictionary containing the configuration.

ioutil.load_config_genes(config_file)

Load the configuration for creating a new list of genes from a YAML file.

Parameters:
config_filestr

A YAML configuration file.

Returns:
configdict

A dictionary containing the configuration.

ioutil.load_decoder_outputs(csv_file, sep=',', split=True)

Load the decoder’s outputs from a CSV file.

Parameters:
csv_filestr

A CSV file containing a data frame with the decoder’s outputs.

Each row should represent the decoder’s output for a given representation, while each column should contain either the values of the output or additional information about it.

sepstr, ","

The column separator in the input CSV file.

splitbool, True

Whether to split the input data frame into two data frames, one with only the columns containing the decoder’s outputs and the other containing only the columns with additional information, if any were found.

Returns:
df_datapandas.DataFrame

A data frame containing the decoder’s outputs.

Here, each row represents the decoder’s output for a given representation. and the columns contain the values of the output.

If split is False, this data frame will also contain the columns with additional information about the output, if any were found.

df_other_datapandas.DataFrame

A data frame containing additional information about the decoder’s outputs found in the input data frame.

Here, each row represents the decoder’s output for a given representations and the columns contain additional information provided in the input data frame.

If split is False, only df_data is returned.

ioutil.load_samples(csv_file, sep=',', keep_samples_names=True, split=True)

Load the data frame containing the gene expression data for the samples of interest.

Parameters:
csv_filestr

A CSV file containing a data frame with the samples’ data.

The rows of the data frame should represent the samples, while the columns should represent the genes and any additional information about the samples.

Each column containing gene expression data must be named after the gene’s Ensembl ID.

sepstr, ","

The column separator in the input CSV file.

keep_samples_namesbool, True

Whether to keep the names/IDs/indexes assigned to the samples in the input data frame.

If True, the samples’ names/IDs/indexes are assumed to be in the first column of the input data frame.

splitbool, True

Whether to split the input data frame into two data frames, one with only the columns containing the gene expression data and the other containing only the columns with additional information about the samples, if any were found.

Returns:
df_datapandas.DataFrame

A data frame containing the gene expression data.

Here, the rows represent the samples and the columns represent the genes. Therefore, each cell contains the expression of a gene in a specific sample.

If split is False, this data frame will contain also the columns containing additional information about the samples, if any were found.

df_other_datapandas.DataFrame

A data frame containing the additional information about the samples found in the input data frame.

If split is False, only df_data is returned.

ioutil.load_representations(csv_file, sep=',', split=True)

Load the representations from a CSV file.

Parameters:
csv_filestr

A CSV file containing a data frame with the representations.

Each row should contain a representation. The columns should contain the representation’s values along the latent space’s dimensions and additional information about the representations, if present.

sepstr, ","

The column separator in the input CSV file.

splitbool, True

Whether to split the input data frame into two data frames, one with only the columns with the representations’ values along the latent space’s dimensions, and the other containing only the columns with additional information about the representations, if any were found.

Returns:
df_datapandas.DataFrame

A data frame containing the representations’ values along the latent space’s dimensions.

Here, each row contains a representation and the columns contain the representation’s values along the latent space’s dimensions.

If split is False, this data frame will also contain the columns with additional information about the representations, if any were found.

df_other_datapandas.DataFrame

A data frame containing the additional information about the representations found in the input data frame.

Here, each row contains a representation and the columns contain additional information about the representation provided in the input data frame.

If split is False, only df_data is returned.

ioutil.load_time(csv_file, sep=',')

Load the information about the CPU and wall clock time from a CSV file.

Parameters:
csv_filestr

A CSV file containing the time information.

sepstr, ","

The column separator in the input CSV file.

Returns:
df_timepandas.DataFrame

A data frame containing the information about the CPU and wall clock time.

ioutil.load_loss(csv_file, sep=',')

Load the loss(es) from a CSV file.

Parameters:
csv_filestr

A CSV file containing the loss(es).

sepstr, ","

The column separator in the input CSV file.

Returns:
df_timepandas.DataFrame

A data frame containing the loss(es).

ioutil.save_representations(df, csv_file, sep=',')

Save the representations to a CSV file.

Parameters:
dfpandas.DataFrame

A data frame containing the representations, and, possibly, additional information about the representations.

csv_filestr

The output CSV file.

sepstr, ","

The column separator in the output CSV file.

ioutil.save_samples(df, csv_file, sep=',')

Save the samples to a CSV file.

Parameters:
dfpandas.DataFrame

A data frame containing the samples.

csv_filestr

The output CSV file.

sepstr, ","

The column separator in the output CSV file.

ioutil.save_decoder_outputs(df, csv_file, sep=',')

Save the decoder’s outputs to a CSV file.

Parameters:
dfpandas.DataFrame

A data frame containing the decoder’s outputs.

csv_filestr

The output CSV file.

sepstr, ","

The column separator in the output CSV file.

ioutil.save_time(df, csv_file, sep=',')

Save the information about the CPU and wall clock time in a CSV file.

Parameters:
dfpandas.DataFrame

A data frame containing the time data.

csv_filestr

The output CSV file.

sepstr, ","

The column separator in the output CSV file.

ioutil.save_loss(df, csv_file, sep=',')

Save the loss(es) in a CSV file.

Parameters:
dfpandas.DataFrame

A data frame containing the time data.

csv_filestr

The output CSV file.

sepstr, ","

The column separator in the output CSV file.

ioutil.preprocess_samples(df_samples, genes_txt_file=None)

Preprocess a set of new samples.

Parameters:
df_samplespandas.DataFrame

A data frame containing the samples to be preprocessed.

The rows of the data frame should represent the samples, while the columns should represent the genes and any additional information about the samples.

genes_txt_filestr, optional

A plain text file containing the list of genes (identified by their Ensembl IDs) included in the DGD model.

If not provided, the one defined in bulkDGD.ioutil.defaults.GENES_FILE will be used.

Returns:
df_preprocpandas.DataFrame

The data frame with the preprocessed samples.

genes_excludedlist

The list of genes found in the input data frame but not included in the DGD model.

These genes are dropped from the df_preproc data frame.

genes_missinglist

A list of genes included in the DGD model but not found in the input data frame.

These genes are added with a count of 0 for all samples in the df_preproc data frame.