analysis.reduction
Utilities to perform dimensionality reduction.
- analysis.reduction.perform_kpca(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')
Perform a kernel principal component analysis (KPCA) on a set of data points.
- Parameters:
- df
pandas.DataFrame A data frame containing the data points.
The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.
- fitted_model
sklearn.decomposition.KernelPCA, optional An already fitted model onto which the data points should be projected.
- options
dict, optional A dictionary containing the options used when performing the analysis.
The available options are those that can be used to initialize a
sklearn.decomposition.KernelPCAinstance.- input_columns
strorlist, optional Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.
By default, all columns of the input data frame are used for the analysis.
- keep_unused_columns
bool,True Whether to append the unused columns to the output data frame.
- output_columns_prefix
str,"C" A string representing the prefix used for the columns of the output data frame.
- df
- Returns:
- df_results
pandas.DataFrame A data frame containing the results of the analysis.
The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.
- pca
sklearn.decomposition.KernelPCA The fitted model.
- df_results
- analysis.reduction.perform_mds(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')
Perform a multidimensional scaling (MDS) on a set of data points.
- Parameters:
- df
pandas.DataFrame A data frame containing the data points.
The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.
- fitted_model
sklearn.manifold.MDS, optional An already fitted model onto which the data points should be projected.
- options
dict, optional A dictionary containing the options used when performing the analysis.
The available options are those that can be used to initialize a
sklearn.manifold.MDSinstance.- input_columns
strorlist, optional Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.
By default, all columns of the input data frame are used for the analysis.
- keep_unused_columns
bool,True Whether to append the unused columns to the output data frame.
- output_columns_prefix
str,"C" A string representing the prefix used for the columns of the output data frame.
- df
- Returns:
- df_results
pandas.DataFrame A data frame containing the results of the analysis.
The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.
- mds
sklearn.manifold.MDS The fitted model.
- df_results
- analysis.reduction.perform_pca(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')
Perform a principal component analysis (PCA) on a set of data points.
- Parameters:
- df
pandas.DataFrame A data frame containing the data points.
The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.
- fitted_model
sklearn.decomposition.PCA, optional An already fitted model onto which the data points should be projected.
- options
dict, optional A dictionary containing the options used when performing the analysis.
The available options are those that can be used to initialize a
sklearn.decomposition.PCAinstance.- input_columns
strorlist, optional Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.
By default, all columns of the input data frame are used for the analysis.
- keep_unused_columns
bool,True Whether to append the unused columns to the output data frame.
- output_columns_prefix
str,"C" A string representing the prefix used for the columns of the output data frame.
- df
- Returns:
- df_results
pandas.DataFrame A data frame containing the results of the analysis.
The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.
- pca
sklearn.decomposition.PCA The fitted model.
- df_results
- analysis.reduction.perform_tsne(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')
Perform a t-distributed stochastic neighbor embedding (t-SNE) on a set of data points.
- Parameters:
- df
pandas.DataFrame A data frame containing the data points.
The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.
- fitted_model
sklearn.manifold.TSNE, optional An already fitted model onto which the data points should be projected.
- options
dict, optional A dictionary containing the options used when performing the analysis.
The available options are those that can be used to initialize a
sklearn.manifold.TSNEinstance.- input_columns
strorlist, optional Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.
By default, all columns of the input data frame are used for the analysis.
- keep_unused_columns
bool,True Whether to append the unused columns to the output data frame.
- output_columns_prefix
str,"C" A string representing the prefix used for the columns of the output data frame.
- df
- Returns:
- df_results
pandas.DataFrame A data frame containing the results of the analysis.
The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.
- tsne
sklearn.manifold.TSNE The fitted model.
- df_results