4.7 Article

Graph Algorithms for Condensing and Consolidating Gene Set Analysis Results

Journal

MOLECULAR & CELLULAR PROTEOMICS
Volume 18, Issue 8, Pages S141-S152

Publisher

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC
DOI: 10.1074/mcp.TIR118.001263

Keywords

-

Funding

  1. National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC) [U24CA210954]
  2. Cancer Prevention AMP
  3. Research Institutes of Texas [CPRIT RR160027]
  4. McNair Medical Institute at The Robert and Janice McNair Foundation

Ask authors/readers for more resources

Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52-77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNASeq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github. com/bzhanglab/sumer.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available