4.8 Article

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data

期刊

NUCLEIC ACIDS RESEARCH
卷 41, 期 13, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkt372

关键词

-

资金

  1. National Basic Research Program of China [2012CB316504]
  2. National High Technology Research and Development Program of China [2012AA020401]
  3. National Natural Science Foundation of China [61175002, 60805010]
  4. Tsinghua University Initiative Scientific Research Program, NIH Center of Excellence in Genomic Sciences [NIH/HG 2 P50 HG002790-06]
  5. NIH/NHGRI [1U01 HG006531-01]
  6. NSF/DMS ATD [7031026]
  7. National High Technology Research and Development Program of China
  8. Directorate For Geosciences
  9. Division Of Ocean Sciences [1136818] Funding Source: National Science Foundation
  10. Division Of Mathematical Sciences
  11. Direct For Mathematical & Physical Scien [1043075] Funding Source: National Science Foundation

向作者/读者索取更多资源

Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F-1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/)

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据