☆ 4.5 Article

On the Sample Complexity of Cancer Pathways Identification

JOURNAL OF COMPUTATIONAL BIOLOGY (2016)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 23, 期 1, 页码 30-41

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2015.0100

关键词

cancer pathways; exclusivity; PAC learning; VC dimension

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

NSF [IIS-1247581]
NIH [R01-CA180776]
University of Padova [CPDA121378/12]
NATIONAL CANCER INSTITUTE [R01CA180776] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Advances in DNA sequencing technologies have enabled large cancer sequencing studies, collecting somatic mutation data from a large number of cancer patients. One of the main goals of these studies is the identification of all cancer genesgenes associated with cancer. Its achievement is complicated by the extensive mutational heterogeneity of cancer, due to the fact that important mutations in cancer target combinations of genes (i.e., pathways). Recently, the pattern of mutual exclusivity among mutations in a cancer pathway has been observed, and methods that find significant combinations of cancer genes by detecting mutual exclusivity have been proposed. A key question in the analysis of mutual exclusivity is the computation of the minimum number of samples required to reliably find a meaningful set of mutually exclusive mutations in the data, or conclude that there is no such set. In general, the problem of determining the sample complexity, or the number of samples required to identify significant combinations of features, of genomic problems is largely unexplored. In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.

On the Sample Complexity of Cancer Pathways Identification

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On the Sample Complexity of Cancer Pathways Identification

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文