xomics.pRank

class xomics.pRank(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]

Bases: object

Hybrid imputation algorithm for missing values (MVs) in (prote)omics data.

Parameters:

col_id (str) –
col_name (str) –
str_quant (str) –

__init__(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]

Parameters:

col_id (str) – Name of column with identifiers in DataFrame.
col_name (str) – Name of column with sample names in DataFrame.
str_quant (str) – Identifier for the LFQ columns in the DataFrame.

Methods

`__init__`([col_id, col_name, str_quant])	type col_id: `str`
`c_score`([df_imp, ids, col_id])	Obtain protein use_cases confidence score (C score) from cImpute output
`e_hits`([ids, id_lists, terms, ...])	Get association matrix for protein ids and enrichment terms.
`e_score`([df_fc, col_name, df_enrich, ...])	Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.
`p_score`([df_fc, col_fc, col_pval])	Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.

static p_score(df_fc=None, col_fc=None, col_pval=None)[source]

Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.

Parameters:

df_fc (Optional[DataFrame]) – DataFrame with fold-change and p-values.
col_fc (Optional[str]) – Name of column from df with fold change values for each protein (log2 fold recommended).
col_pval (Optional[str]) – Name of column from df with p-values for each protein (-log10 fold recommended).

Returns:

Input DataFrame with p-score for each protein given in ‘P-Score’ column.

Return type:

df_fc

static e_score(df_fc=None, col_name=None, df_enrich=None, col_fe=None, col_pval=None, col_name_lists=None)[source]

Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.

Parameters:

df_fc (Optional[DataFrame]) – DataFrame with fold-change and p-values.
col_name (Optional[str]) – Name of column from df_fc with protein names.
df_enrich (Optional[DataFrame]) – DataFrame with fold enrichment and p-values for each enrichment term.
col_fe (Optional[str]) – Name of column from df_enrich with fold enrichment values for each enrichment term (log2 fold recommended).
col_pval (Optional[str]) – Name of column from df_enrich with p-values for each term (-log10 fold recommended).
col_name_lists (Optional[str]) – Name of column from df_enrich with protein name lists. Lists should contain names from col_names.

Returns:

Input DataFrame with E-score for each protein given in ‘P-Score’ column.

Return type:

df_fc

static c_score(df_imp=None, ids=None, col_id=None)[source]

Obtain protein use_cases confidence score (C score) from cImpute output

Parameters:

df_imp (Optional[DataFrame]) – Data Frame with imputed values from cImpute.
ids (Optional[NewType()(ArrayLike1D, Union[Sequence[Union[int, float]], ndarray, Series])]) – List or array of protein identifiers.
col_id (Optional[str]) – Name of id column from ‘df_imp’. If None, index will be considered for ids.

Returns:

Array of confidence scores (C scores) from imputation for each protein.

Return type:

c_scores

static e_hits(ids=None, id_lists=None, terms=None, terms_sub_list=None, n_ids=None, n_terms=None, sort_alpha=False)[source]

Get association matrix for protein ids and enrichment terms.

Get matrix with associations between protein/gene ids and id sets representing protein/gene lists associated with specific biological terms obtained from an enrichment analysis (referred to as ‘enrichment terms’) such as GO or KEGG pathway terms.

Parameters:

ids (array-like) – Array of protein identifiers.
id_lists (list of lists) – List of protein identifier sets from enrichment analysis (e.g., set of proteins linked to specific GO term).
terms (list or array-like) – List of enrichment terms matching to id_lists
terms_sub_list (list or array-like, default = None) – Sublist of enrichment terms (must be subset of ‘terms’). If not None, terms will be used to filter output
n_ids (integer, default = None) – Filter results for ‘n_ids’ genes/proteins from ‘ids’ with the highest number of associations if not None
n_terms (integer, default = None) – Filter results for ‘n_terms’ terms from ‘term_list’ with the highest number of associations if not None
sort_alpha (bool, default = False) – Sort falues in alphabetically (if True) or in descending order of hit counts (if False)

Returns:

df_e_hit – Data frame with links between gene/protein ids and ‘enrichment’ terms

Return type:

pandas.DataFrame

Examples

>>> e_hits(ids=['gene1', 'gene2', 'gene3'], id_lists=[['gene1', 'gene2'], ['gene2', 'gene3']],
... terms=['term1', 'term2'], n_ids=2, n_terms=1)
       gene1  gene2
term1      1      1