xomics.pRank

class xomics.pRank(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]

Bases: object

Hybrid imputation algorithm for missing values (MVs) in (prote)omics data.

Parameters:
  • col_id (str) –

  • col_name (str) –

  • str_quant (str) –

__init__(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]
Parameters:
  • col_id (str) – Name of column with identifiers in DataFrame.

  • col_name (str) – Name of column with sample names in DataFrame.

  • str_quant (str) – Identifier for the LFQ columns in the DataFrame.

Methods

__init__([col_id, col_name, str_quant])

type col_id:

str

c_score([df_imp, ids, col_id])

Obtain protein use_cases confidence score (C score) from cImpute output

e_hits([ids, id_lists, terms, ...])

Get association matrix for protein ids and enrichment terms.

e_score([df_fc, col_name, df_enrich, ...])

Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.

p_score([df_fc, col_fc, col_pval])

Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.

static p_score(df_fc=None, col_fc=None, col_pval=None)[source]

Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.

Parameters:
  • df_fc (Optional[DataFrame]) – DataFrame with fold-change and p-values.

  • col_fc (Optional[str]) – Name of column from df with fold change values for each protein (log2 fold recommended).

  • col_pval (Optional[str]) – Name of column from df with p-values for each protein (-log10 fold recommended).

Returns:

Input DataFrame with p-score for each protein given in ‘P-Score’ column.

Return type:

df_fc

static e_score(df_fc=None, col_name=None, df_enrich=None, col_fe=None, col_pval=None, col_name_lists=None)[source]

Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.

Parameters:
  • df_fc (Optional[DataFrame]) – DataFrame with fold-change and p-values.

  • col_name (Optional[str]) – Name of column from df_fc with protein names.

  • df_enrich (Optional[DataFrame]) – DataFrame with fold enrichment and p-values for each enrichment term.

  • col_fe (Optional[str]) – Name of column from df_enrich with fold enrichment values for each enrichment term (log2 fold recommended).

  • col_pval (Optional[str]) – Name of column from df_enrich with p-values for each term (-log10 fold recommended).

  • col_name_lists (Optional[str]) – Name of column from df_enrich with protein name lists. Lists should contain names from col_names.

Returns:

Input DataFrame with E-score for each protein given in ‘P-Score’ column.

Return type:

df_fc

static c_score(df_imp=None, ids=None, col_id=None)[source]

Obtain protein use_cases confidence score (C score) from cImpute output

Parameters:
Returns:

Array of confidence scores (C scores) from imputation for each protein.

Return type:

c_scores

static e_hits(ids=None, id_lists=None, terms=None, terms_sub_list=None, n_ids=None, n_terms=None, sort_alpha=False)[source]

Get association matrix for protein ids and enrichment terms.

Get matrix with associations between protein/gene ids and id sets representing protein/gene lists associated with specific biological terms obtained from an enrichment analysis (referred to as ‘enrichment terms’) such as GO or KEGG pathway terms.

Parameters:
  • ids (array-like) – Array of protein identifiers.

  • id_lists (list of lists) – List of protein identifier sets from enrichment analysis (e.g., set of proteins linked to specific GO term).

  • terms (list or array-like) – List of enrichment terms matching to id_lists

  • terms_sub_list (list or array-like, default = None) – Sublist of enrichment terms (must be subset of ‘terms’). If not None, terms will be used to filter output

  • n_ids (integer, default = None) – Filter results for ‘n_ids’ genes/proteins from ‘ids’ with the highest number of associations if not None

  • n_terms (integer, default = None) – Filter results for ‘n_terms’ terms from ‘term_list’ with the highest number of associations if not None

  • sort_alpha (bool, default = False) – Sort falues in alphabetically (if True) or in descending order of hit counts (if False)

Returns:

df_e_hit – Data frame with links between gene/protein ids and ‘enrichment’ terms

Return type:

pandas.DataFrame

Examples

>>> e_hits(ids=['gene1', 'gene2', 'gene3'], id_lists=[['gene1', 'gene2'], ['gene2', 'gene3']],
... terms=['term1', 'term2'], n_ids=2, n_terms=1)
       gene1  gene2
term1      1      1