Configuration for creating an instance of the bulkDGD model
To create a new instance of core.model.BulkDGDModel, we need to set the following options:
input_dim, the dimensionality of the model’s input.gmm_options, a dictionary of options to set up the Gaussian mixture model.dec_options, a dictionary of options to set up the decoder.genes_txt_file, the path to a plain text file containing the list of genes that will be included in the model.
If we want to load a model that has already been trained, we should provide also the following options:
gmm_pth_file, a PyTorch file containing the trained paramenters of the Gaussian mixture model.dec_pth_file, a PyTorch file containing the trained parameters of the decoder.
This is how the gmm_options dictionary should look like:
{# Set the number of components in the Gaussian mixture model.
#
# Type: int.
"n_comp" : 45,
# Set the type of covariance matrix used by the Gaussian mixture
# model.
#
# Type: str.
#
# Options:
# - 'fixed' for a fixed covariance matrix.
# - 'isotropic' for an isotropic covariance matrix.
# - 'diagonal' for a diagonal covariance matrix.
"cm_type" : "diagonal",
# Set the prior distribution over the means of the components of
# the Gaussian mixture model.
#
# Type: str.
#
# Options:
# - 'softball' for a softball distribution.
"means_prior_name" : "softball",
# Set the options to set up the prior distribution (they vary
# depending on the prior defined by 'means_prior_name').
"means_prior_options" : \
# Set these options if 'means_prior_name' is 'softball'.
{# Set the radius of the soft ball.
#
# Type: int.
"radius" : 7,
# Set the sharpness of the soft boundary of the ball.
#
# Type: int.
"sharpness" : 10},
# Set the prior distribution over the weights of the components of
# the Gaussian mixture model.
#
# Type: str.
#
# Options:
# - 'dirichlet' for a Dirichlet distribution.
"weights_prior_name" : "dirichlet",
# Set the options to set up the prior (they vary according to the
# prior defined by 'weights_prior_name').
"weights_prior_options" : \
# Set these options if 'weights_prior_name' is 'dirichlet'.
{# Set the alpha of the Dirichlet distribution determining the
# uniformity of the weights of the components in the Gaussian
# mixture model.
#
# Type: int.
"alpha": 5},
# Set the prior distribution over the log-variances of the
# components of the Gaussian mixture model.
#
# Type: str.
#
# Options:
# - 'gaussian' for a Gaussian distribution.
"log_var_prior_name" : "gaussian",
# Set the options to set up the prior (they vary according to the
# prior defined by 'log_var_prior_name').
"log_var_prior_options" :
# Set these options if 'log_var_prior_name' is 'gaussian'.
{# Set the mean of the Gaussian distribution calculated as
# 2 * log(mean).
#
# Type: float.
"mean" : 0.1,
# Set the standard deviation of the Gaussian distribution.
#
# Type: float.
"stddev": 1.0},
}
And this is how the dec_options dictionary should look like:
{# Set the number of units in the hidden layers.
#
# Type: list of int.
"n_units_hidden_layers" : [500, 8000],
# Set the activation function for each hidden layer.
#
# Type: list of str.
#
# Options:
# - "relu" for the ReLU function.
# - "elu" for the ELU function.
"activations": ["relu", "relu"],
# Set the name of the decoder's output module.
#
# Type: str.
#
# Options:
# - 'nb_feature_dispersion' for negative binomial distributions
# with means learned per gene per sample and r-values learned per
# gene.
# - 'nb_full_dispersion' for negative binomial distributions with
# both means and r-values learned per gene per sample.
# - 'poisson' for Poisson distributions.
"output_module_name" : "nb_feature_dispersion",
# Set the options for the output module.
"output_module_options" : \
{# Set the name of the activation function in the output module.
#
# Type: str.
#
# Options:
# - 'sigmoid' for a sigmoid function.
# - 'softplus' for a softplus function.
"activation" : "softplus",
# Set the initial r-value for the negative binomial
# distributions modeling the genes' counts.
#
# Type: int.
"r_init" : 2},
}
If we are loading the options from a YAML configuration file similar to those provided in the bulkDGD/configs/model directory, we can set up the model as follows:
# Import 'ioutil' and the 'core.model' module.
from bulkDGD import ioutil
from bulkDGD.core import model
# Let's assume we load the 'model_untrained.yaml' configuration file.
# Load the configuration from the configuration file.
config = ioutil.load_config_model(config_file = "model_untrained")
# The configuration contains a 'input_dim' section, a 'gmm_options'
# section, and a 'dec_opt' section.
# Initialize the model.
dgd_model = model.BulkDGDModel(**config)