4.6 Article

SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification

期刊

PLOS ONE
卷 10, 期 3, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0117135

关键词

-

资金

  1. Hong Kong Research Grants Council (RGC) Early Career Scheme (ECS) [CUHK 439513]
  2. NSF [CMMI-1161242]
  3. NIH [LM010098, LM009012]
  4. National Institute of Health grants from the National Center for Research Resources [P20RR016460]
  5. National Institute of General Medical Sciences [P20GM103429]
  6. NATIONAL CANCER INSTITUTE [P30CA023108] Funding Source: NIH RePORTER
  7. NATIONAL CENTER FOR RESEARCH RESOURCES [P20RR016460] Funding Source: NIH RePORTER
  8. NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES [P20GM103429] Funding Source: NIH RePORTER
  9. NATIONAL LIBRARY OF MEDICINE [R01LM010098, R01LM009012] Funding Source: NIH RePORTER
  10. EPSCoR [1003970] Funding Source: National Science Foundation

向作者/读者索取更多资源

It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据