ioutil - utilities for I/O operations
- ioutil.load_config_model(config_file)
Load the configuration specifying the DGD model’s parameters and, possibly, the path to the files containing the trained model from a YAML file.
- ioutil.load_config_rep(config_file)
Load the configuration containing the options for the optimization round(s) to find the best representations for a set of samples from a YAML file.
- ioutil.load_config_train(config_file)
Load the configuration containing the options for training the model from a YAML file.
- ioutil.load_config_plot(config_file)
Load a configuration for a plot from a YAML file.
- ioutil.load_config_genes(config_file)
Load the configuration for creating a new list of genes from a YAML file.
- ioutil.load_decoder_outputs(csv_file, sep=',', split=True)
Load the decoder’s outputs from a CSV file.
- Parameters:
- csv_file
str A CSV file containing a data frame with the decoder’s outputs.
Each row should represent the decoder’s output for a given representation, while each column should contain either the values of the output or additional information about it.
- sep
str,"," The column separator in the input CSV file.
- split
bool,True Whether to split the input data frame into two data frames, one with only the columns containing the decoder’s outputs and the other containing only the columns with additional information, if any were found.
- csv_file
- Returns:
- df_data
pandas.DataFrame A data frame containing the decoder’s outputs.
Here, each row represents the decoder’s output for a given representation. and the columns contain the values of the output.
If
splitisFalse, this data frame will also contain the columns with additional information about the output, if any were found.- df_other_data
pandas.DataFrame A data frame containing additional information about the decoder’s outputs found in the input data frame.
Here, each row represents the decoder’s output for a given representations and the columns contain additional information provided in the input data frame.
If
splitisFalse, onlydf_datais returned.
- df_data
- ioutil.load_samples(csv_file, sep=',', keep_samples_names=True, split=True)
Load the data frame containing the gene expression data for the samples of interest.
- Parameters:
- csv_file
str A CSV file containing a data frame with the samples’ data.
The rows of the data frame should represent the samples, while the columns should represent the genes and any additional information about the samples.
Each column containing gene expression data must be named after the gene’s Ensembl ID.
- sep
str,"," The column separator in the input CSV file.
- keep_samples_names
bool,True Whether to keep the names/IDs/indexes assigned to the samples in the input data frame.
If
True, the samples’ names/IDs/indexes are assumed to be in the first column of the input data frame.- split
bool,True Whether to split the input data frame into two data frames, one with only the columns containing the gene expression data and the other containing only the columns with additional information about the samples, if any were found.
- csv_file
- Returns:
- df_data
pandas.DataFrame A data frame containing the gene expression data.
Here, the rows represent the samples and the columns represent the genes. Therefore, each cell contains the expression of a gene in a specific sample.
If
splitisFalse, this data frame will contain also the columns containing additional information about the samples, if any were found.- df_other_data
pandas.DataFrame A data frame containing the additional information about the samples found in the input data frame.
If
splitisFalse, onlydf_datais returned.
- df_data
- ioutil.load_representations(csv_file, sep=',', split=True)
Load the representations from a CSV file.
- Parameters:
- csv_file
str A CSV file containing a data frame with the representations.
Each row should contain a representation. The columns should contain the representation’s values along the latent space’s dimensions and additional information about the representations, if present.
- sep
str,"," The column separator in the input CSV file.
- split
bool,True Whether to split the input data frame into two data frames, one with only the columns with the representations’ values along the latent space’s dimensions, and the other containing only the columns with additional information about the representations, if any were found.
- csv_file
- Returns:
- df_data
pandas.DataFrame A data frame containing the representations’ values along the latent space’s dimensions.
Here, each row contains a representation and the columns contain the representation’s values along the latent space’s dimensions.
If
splitisFalse, this data frame will also contain the columns with additional information about the representations, if any were found.- df_other_data
pandas.DataFrame A data frame containing the additional information about the representations found in the input data frame.
Here, each row contains a representation and the columns contain additional information about the representation provided in the input data frame.
If
splitisFalse, onlydf_datais returned.
- df_data
- ioutil.load_time(csv_file, sep=',')
Load the information about the CPU and wall clock time from a CSV file.
- Parameters:
- Returns:
- df_time
pandas.DataFrame A data frame containing the information about the CPU and wall clock time.
- df_time
- ioutil.load_loss(csv_file, sep=',')
Load the loss(es) from a CSV file.
- Parameters:
- Returns:
- df_time
pandas.DataFrame A data frame containing the loss(es).
- df_time
- ioutil.save_representations(df, csv_file, sep=',')
Save the representations to a CSV file.
- Parameters:
- df
pandas.DataFrame A data frame containing the representations, and, possibly, additional information about the representations.
- csv_file
str The output CSV file.
- sep
str,"," The column separator in the output CSV file.
- df
- ioutil.save_samples(df, csv_file, sep=',')
Save the samples to a CSV file.
- Parameters:
- df
pandas.DataFrame A data frame containing the samples.
- csv_file
str The output CSV file.
- sep
str,"," The column separator in the output CSV file.
- df
- ioutil.save_decoder_outputs(df, csv_file, sep=',')
Save the decoder’s outputs to a CSV file.
- Parameters:
- df
pandas.DataFrame A data frame containing the decoder’s outputs.
- csv_file
str The output CSV file.
- sep
str,"," The column separator in the output CSV file.
- df
- ioutil.save_time(df, csv_file, sep=',')
Save the information about the CPU and wall clock time in a CSV file.
- Parameters:
- df
pandas.DataFrame A data frame containing the time data.
- csv_file
str The output CSV file.
- sep
str,"," The column separator in the output CSV file.
- df
- ioutil.save_loss(df, csv_file, sep=',')
Save the loss(es) in a CSV file.
- Parameters:
- df
pandas.DataFrame A data frame containing the time data.
- csv_file
str The output CSV file.
- sep
str,"," The column separator in the output CSV file.
- df
- ioutil.preprocess_samples(df_samples, genes_txt_file=None)
Preprocess a set of new samples.
- Parameters:
- df_samples
pandas.DataFrame A data frame containing the samples to be preprocessed.
The rows of the data frame should represent the samples, while the columns should represent the genes and any additional information about the samples.
- genes_txt_file
str, optional A plain text file containing the list of genes (identified by their Ensembl IDs) included in the DGD model.
If not provided, the one defined in
bulkDGD.ioutil.defaults.GENES_FILEwill be used.
- df_samples
- Returns:
- df_preproc
pandas.DataFrame The data frame with the preprocessed samples.
- genes_excluded
list The list of genes found in the input data frame but not included in the DGD model.
These genes are dropped from the
df_preprocdata frame.- genes_missing
list A list of genes included in the DGD model but not found in the input data frame.
These genes are added with a count of 0 for all samples in the
df_preprocdata frame.
- df_preproc