analysis.reduction

Utilities to perform dimensionality reduction.

analysis.reduction.perform_kpca(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')

Perform a kernel principal component analysis (KPCA) on a set of data points.

Parameters:
dfpandas.DataFrame

A data frame containing the data points.

The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.

fitted_modelsklearn.decomposition.KernelPCA, optional

An already fitted model onto which the data points should be projected.

optionsdict, optional

A dictionary containing the options used when performing the analysis.

The available options are those that can be used to initialize a sklearn.decomposition.KernelPCA instance.

input_columnsstr or list, optional

Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.

By default, all columns of the input data frame are used for the analysis.

keep_unused_columnsbool, True

Whether to append the unused columns to the output data frame.

output_columns_prefixstr, "C"

A string representing the prefix used for the columns of the output data frame.

Returns:
df_resultspandas.DataFrame

A data frame containing the results of the analysis.

The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.

pcasklearn.decomposition.KernelPCA

The fitted model.

analysis.reduction.perform_mds(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')

Perform a multidimensional scaling (MDS) on a set of data points.

Parameters:
dfpandas.DataFrame

A data frame containing the data points.

The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.

fitted_modelsklearn.manifold.MDS, optional

An already fitted model onto which the data points should be projected.

optionsdict, optional

A dictionary containing the options used when performing the analysis.

The available options are those that can be used to initialize a sklearn.manifold.MDS instance.

input_columnsstr or list, optional

Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.

By default, all columns of the input data frame are used for the analysis.

keep_unused_columnsbool, True

Whether to append the unused columns to the output data frame.

output_columns_prefixstr, "C"

A string representing the prefix used for the columns of the output data frame.

Returns:
df_resultspandas.DataFrame

A data frame containing the results of the analysis.

The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.

mdssklearn.manifold.MDS

The fitted model.

analysis.reduction.perform_pca(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')

Perform a principal component analysis (PCA) on a set of data points.

Parameters:
dfpandas.DataFrame

A data frame containing the data points.

The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.

fitted_modelsklearn.decomposition.PCA, optional

An already fitted model onto which the data points should be projected.

optionsdict, optional

A dictionary containing the options used when performing the analysis.

The available options are those that can be used to initialize a sklearn.decomposition.PCA instance.

input_columnsstr or list, optional

Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.

By default, all columns of the input data frame are used for the analysis.

keep_unused_columnsbool, True

Whether to append the unused columns to the output data frame.

output_columns_prefixstr, "C"

A string representing the prefix used for the columns of the output data frame.

Returns:
df_resultspandas.DataFrame

A data frame containing the results of the analysis.

The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.

pcasklearn.decomposition.PCA

The fitted model.

analysis.reduction.perform_tsne(df, fitted_model=None, options=None, input_columns=None, keep_unused_columns=True, output_columns_prefix='C')

Perform a t-distributed stochastic neighbor embedding (t-SNE) on a set of data points.

Parameters:
dfpandas.DataFrame

A data frame containing the data points.

The rows of the data frame should represent the different data points, while the columns should represent the dimensions of the space where the data points live.

fitted_modelsklearn.manifold.TSNE, optional

An already fitted model onto which the data points should be projected.

optionsdict, optional

A dictionary containing the options used when performing the analysis.

The available options are those that can be used to initialize a sklearn.manifold.TSNE instance.

input_columnsstr or list, optional

Either a list containing the names of the columns whose contents should be used for the analysis or a string representing a pattern that the columns of interest should fit.

By default, all columns of the input data frame are used for the analysis.

keep_unused_columnsbool, True

Whether to append the unused columns to the output data frame.

output_columns_prefixstr, "C"

A string representing the prefix used for the columns of the output data frame.

Returns:
df_resultspandas.DataFrame

A data frame containing the results of the analysis.

The rows will contain the data points, while the columns will contain the values of each data point’s projection along the dimensions of the projection space.

tsnesklearn.manifold.TSNE

The fitted model.