xomics.PreProcess
- class xomics.PreProcess(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]
Bases:
objectPre-processing class for quantifications of omics data.
Methods
__init__([col_id, col_name, str_quant])- type col_id:
add_ids([df, list_ids])Add column with protein ids to DataFrame.
add_significance([df, col_fc, col_pval, ...])Add a column indicating significance regarding threshold for fold change and p-value
apply_exp([df, cols, base, neg])Apply an exponential transformation to specified columns of a DataFrame.
apply_log([df, cols, log2, neg])Apply a logarithmic transformation to specified columns of a DataFrame.
filter_duplicated_names([df, col, ...])Filter for duplicated items in columns (e.g., names).
filter_groups([df, groups, min_pct])Remove samples with missing values unless one group has at least
min_pctnon-missing values.filter_nan([df, groups, cols])Filter missing values based on provided
groupsor columns (cols).get_dict_group_qcols([df, groups])Create a dictionary with for groups from df and their corresponding columns with quantifications
get_dict_qcol_group([df, groups])Create a dictionary with quantification columns and the group they are subordinated to
get_qcols([df, groups])Create a list with groups from df based on str_quant and given groups
run([df, groups, groups_ctrl, ...])Perform pairwise t-tests for groups to obtain -log10 p-values and log2 fold changes, with optional p-value correction, nan policy, and log-scale output.
- get_qcols(df=None, groups=None)[source]
Create a list with groups from df based on str_quant and given groups
- Parameters:
- Returns:
List with all quantification columns across all groups
- Return type:
cols_quant
- get_dict_qcol_group(df=None, groups=None)[source]
Create a dictionary with quantification columns and the group they are subordinated to
- Parameters:
- Returns:
Dictionary assigning names of columns with quantifications (keys) to group names (values)
- Return type:
dict_qcol_group
- get_dict_group_qcols(df=None, groups=None)[source]
Create a dictionary with for groups from df and their corresponding columns with quantifications
- Parameters:
- Returns:
Dictionary assigning group names (keys) to list of columns with quantifications (values)
- Return type:
dict_group_qcols
- filter_nan(df=None, groups=None, cols=None)[source]
Filter missing values based on provided
groupsor columns (cols).- Parameters:
- Returns:
The filtered DataFrame.
- Return type:
df
Notes
Two options for selecting filtering columns are provided (
groupsandcols) because removing samples with any missing value is a very strict filtering step.
- filter_groups(df=None, groups=None, min_pct=0.8)[source]
Remove samples with missing values unless one group has at least
min_pctnon-missing values.- Parameters:
- Returns:
The filtered DataFrame.
- Return type:
df
- static filter_duplicated_names(df=None, col=None, str_split=';', split_names=False)[source]
Filter for duplicated items in columns (e.g., names). Items can be split by
str_split.- Parameters:
df (
Optional[DataFrame]) – The DataFrame with quantifications to filter.Rowstypically correspond to proteins andcolumnsto conditions.col (
Optional[str]) – Column fromdfin which to perform string splitting and filtering.str_split (
str) – The string character(s) to use for splitting string values incol.split_names (
bool) – Whether to split names using str_split in the specified cols.
- Returns:
The modified and filtered DataFrame.
- Return type:
df
- static apply_log(df=None, cols=None, log2=True, neg=False)[source]
Apply a logarithmic transformation to specified columns of a DataFrame.
- Parameters:
df (
Optional[DataFrame]) – DataFrame containing data to transform.cols (
Optional[list]) – Names of columns to apply the logarithmic transformation to.log2 (
bool) – If True, apply a log2 transformation. Otherwise, apply a log10 transformation.neg (
bool) – If True, multiply the logarithmic result by -1.
- Returns:
DataFrame with specified columns log-transformed.
- Return type:
df
Notes
Make sure all values in the specified columns of df are > 0 before applying this function.
NaN values will remain NaN after the transformation.
- static apply_exp(df=None, cols=None, base=2.0, neg=False)[source]
Apply an exponential transformation to specified columns of a DataFrame.
- Parameters:
df (
Optional[DataFrame]) – DataFrame containing data to transform.cols (
Optional[list]) – Names of columns to apply the exponential transformation to.base (
int) – The base of the exponential function. Ifbase=2, apply a 2**x transformation, otherwise apply a 10**x transformation ifbase=10.neg (
bool) – If True, multiply the exponential result by -1.
- Returns:
DataFrame with specified columns exponentially transformed.
- Return type:
df
Notes
NaN values will remain NaN after the transformation.
- static add_significance(df=None, col_fc=None, col_pval=None, th_fc=0.5, th_pval=0.05)[source]
Add a column indicating significance regarding threshold for fold change and p-value
Three types of significance classes are defined:
Up: Significant hits that are ‘up-regulated’(i.e., right quadrant of volcano plot)
Down: Significant hits that are ‘down-regulated’ (i.e., left quadrant of volcano plot)
Not Sig.: Hits that are not significant.
- Parameters:
df (
Optional[DataFrame]) – DataFrame containing fold-change and p-values.col_fc (
Optional[str]) – Column name containing fold change values.th_fc (
float) – Threshold for fold-change, applied for negative and positive values.th_pval (
float) – Threshold for p-value, -log10 transformed before applied.
- Returns:
DataFrame with added significance column.
- Return type:
df
- run(df=None, groups=None, groups_ctrl=None, pvals_correction=None, pvals_neg_log10=True)[source]
Perform pairwise t-tests for groups to obtain -log10 p-values and log2 fold changes, with optional p-value correction, nan policy, and log-scale output.
- Parameters:
df (
Optional[DataFrame]) – DataFrame with quantifications.Rowstypically correspond to proteins andcolumnsto conditions.groups (
Optional[list]) – List with names grouping conditions fromdfcolumns.groups_ctrl (
Optional[list]) – List with names control grouping conditions fromdfcolumns.pvals_correction (
Optional[str]) – Correction method for t-tests {“bonferroni”, “sidak”, “holm”, “hommel”, “fdr_bh”}.pvals_neg_log10 (
bool) – Whether to return p-values in -log10 scale.
- Returns:
DataFrame with p-values and log2 fold changes for each group comparison.
- Return type:
df_fc
Notes
Fold changes (FC) and P-values will be computed for each group in
groupscompared against each group ingroup_ctrl(group/group_ctrl), where self-comparison is omitted.