xomics.pRank
- class xomics.pRank(col_id='protein_id', col_name='gene_name', str_quant='log2_lfq')[source]
Bases:
objectHybrid imputation algorithm for missing values (MVs) in (prote)omics data.
Methods
__init__([col_id, col_name, str_quant])- type col_id:
c_score([df_imp, ids, col_id])Obtain protein use_cases confidence score (C score) from cImpute output
e_hits([ids, id_lists, terms, ...])Get association matrix for protein ids and enrichment terms.
e_score([df_fc, col_name, df_enrich, ...])Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.
p_score([df_fc, col_fc, col_pval])Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.
- static p_score(df_fc=None, col_fc=None, col_pval=None)[source]
Calculate the single protein use_cases ranking score (P score) by first z-normalizing fold change scores and p-values, and then integrating them protein-wise to obtain min-max normalized ranking scores.
- Parameters:
- Returns:
Input DataFrame with p-score for each protein given in ‘P-Score’ column.
- Return type:
df_fc
- static e_score(df_fc=None, col_name=None, df_enrich=None, col_fe=None, col_pval=None, col_name_lists=None)[source]
Calculate the single protein enrichment score (E score) by first z-normalizing fold enrichment scores and p-values, and then integrating them protein-wise to obtain a min-max normalized ranking score.
- Parameters:
df_fc (
Optional[DataFrame]) – DataFrame with fold-change and p-values.col_name (
Optional[str]) – Name of column fromdf_fcwith protein names.df_enrich (
Optional[DataFrame]) – DataFrame with fold enrichment and p-values for each enrichment term.col_fe (
Optional[str]) – Name of column fromdf_enrichwith fold enrichment values for each enrichment term (log2 fold recommended).col_pval (
Optional[str]) – Name of column fromdf_enrichwith p-values for each term (-log10 fold recommended).col_name_lists (
Optional[str]) – Name of column fromdf_enrichwith protein name lists. Lists should contain names fromcol_names.
- Returns:
Input DataFrame with E-score for each protein given in ‘P-Score’ column.
- Return type:
df_fc
- static c_score(df_imp=None, ids=None, col_id=None)[source]
Obtain protein use_cases confidence score (C score) from cImpute output
- Parameters:
df_imp (
Optional[DataFrame]) – Data Frame with imputed values fromcImpute.ids (
Optional[NewType()(ArrayLike1D,Union[Sequence[Union[int,float]],ndarray,Series])]) – List or array of protein identifiers.col_id (
Optional[str]) – Name of id column from ‘df_imp’. If None, index will be considered for ids.
- Returns:
Array of confidence scores (C scores) from imputation for each protein.
- Return type:
c_scores
- static e_hits(ids=None, id_lists=None, terms=None, terms_sub_list=None, n_ids=None, n_terms=None, sort_alpha=False)[source]
Get association matrix for protein ids and enrichment terms.
Get matrix with associations between protein/gene ids and id sets representing protein/gene lists associated with specific biological terms obtained from an enrichment analysis (referred to as ‘enrichment terms’) such as GO or KEGG pathway terms.
- Parameters:
ids (array-like) – Array of protein identifiers.
id_lists (list of lists) – List of protein identifier sets from enrichment analysis (e.g., set of proteins linked to specific GO term).
terms (list or array-like) – List of enrichment terms matching to id_lists
terms_sub_list (list or array-like, default = None) – Sublist of enrichment terms (must be subset of ‘terms’). If not None, terms will be used to filter output
n_ids (integer, default = None) – Filter results for ‘n_ids’ genes/proteins from ‘ids’ with the highest number of associations if not None
n_terms (integer, default = None) – Filter results for ‘n_terms’ terms from ‘term_list’ with the highest number of associations if not None
sort_alpha (bool, default = False) – Sort falues in alphabetically (if True) or in descending order of hit counts (if False)
- Returns:
df_e_hit – Data frame with links between gene/protein ids and ‘enrichment’ terms
- Return type:
Examples
>>> e_hits(ids=['gene1', 'gene2', 'gene3'], id_lists=[['gene1', 'gene2'], ['gene2', 'gene3']], ... terms=['term1', 'term2'], n_ids=2, n_terms=1) gene1 gene2 term1 1 1