4.7 Article

MapReduce for accurate error correction of next-generation sequencing data

期刊

BIOINFORMATICS
卷 33, 期 23, 页码 3844-3851

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btx089

关键词

-

资金

  1. National Natural Science Foundation for Young Scientists of China [31501070]
  2. Natural Science Foundation of Guangxi Province, China [2016GXNSFCA380006]
  3. Australia Research Council [DP130102124]

向作者/读者索取更多资源

Motivation: Next-generation sequencing platforms have produced huge amounts of sequence data. This is revolutionizing every aspect of genetic and genomic research. However, these sequence datasets contain quite a number of machine-induced errors-e.g. errors due to substitution can be as high as 2.5%. Existing error-correction methods are still far from perfect. In fact, more errors are sometimes introduced than correct corrections, especially by the prevalent k-mer based methods. The existing methods have also made limited exploitation of on-demand cloud computing. Results: We introduce an error-correction method named MEC, which uses a two-layered MapReduce technique to achieve high correction performance. In the first layer, all the input sequences are mapped to groups to identify candidate erroneous bases in parallel. In the second layer, the erroneous bases at the same position are linked together from all the groups for making statistically reliable corrections. Experiments on real and simulated datasets show that our method outperforms existing methods remarkably. Its per-position error rate is consistently the lowest, and the correction gain is always the highest.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据