4.6 Article

Using quality scores and longer reads improves accuracy of Solexa read mapping

期刊

BMC BIOINFORMATICS
卷 9, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2105-9-128

关键词

-

资金

  1. NHGRI NIH HHS [HG001696, R01 HG001696] Funding Source: Medline

向作者/读者索取更多资源

Background: Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample ( e. g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina IG sequencer can produce tens of millions of reads, ranging in length from similar to 25-50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores. Results: To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at http://rulai.cshl.edu/rmap/. Conclusion: Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据