4.4 Article

DETECTING MUTATIONS IN MIXED SAMPLE SEQUENCING DATA USING EMPIRICAL BAYES

Journal

ANNALS OF APPLIED STATISTICS
Volume 6, Issue 3, Pages 1047-1067

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/12-AOAS538

Keywords

Empirical Bayes; false discovery rates; discrete data; DNA sequencing; genome variation

Funding

  1. NSF VIGRE Fellowship
  2. National Institutes of Health [RC2HG005570, R21CA140089, P01HG000205, U01CS151920]
  3. Howard Hughes Medical Foundation Early Career Grant
  4. The Doris Duke Charitable Foundation
  5. NIH [R01 HG006137-01]
  6. NSF DMS Grant [1043204]
  7. Direct For Mathematical & Physical Scien
  8. Division Of Mathematical Sciences [1043204] Funding Source: National Science Foundation

Ask authors/readers for more resources

We develop statistically based methods to detect single nucleotide DNA mutations in next generation sequencing data. Sequencing generates counts of the number of times each base was observed at hundreds of thousands to billions of genome positions in each sample. Using these counts to detect mutations is challenging because mutations may have very low prevalence and sequencing error rates vary dramatically by genome position. The discreteness of sequencing data also creates a difficult multiple testing problem: current false discovery rate methods are designed for continuous data, and work poorly, if at all, on discrete data. We show that a simple randomization technique lets us use continuous false discovery rate methods on discrete data. Our approach is a useful way to estimate false discovery rates for any collection of discrete test statistics, and is hence not limited to sequencing data. We then use an empirical Bayes model to capture different sources of variation in sequencing error rates. The resulting method outperforms existing detection approaches on example data sets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available