4.7 Article

An Efficient Trimming Algorithm based on Multi-Feature Fusion Scoring Model for NGS Data

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2019.2897558

关键词

NGS; paired-end reads; trimming; technical sequence; base quality value; k-mer frequency; GC content

资金

  1. National Natural Science Foundation of China [61732009, 61772557, 61622213, 61420106009, 61602156]
  2. 111 Project [B18059]
  3. Hunan Provincial Science and Technology Program [2018WK4001]
  4. Fundamental Research Funds for the Central Universities of Central South University [1053320171177]

向作者/读者索取更多资源

Next-generation sequencing (NGS) has enabled an exponential growth rate of sequencing data. However, several sequence artifacts, including error reads (base calling errors and small insertions or deletions) and poor quality reads, which can impose significant impact on the downstream sequence processing and analysis. Here, we present PE-Trimmer, a sensitive and special trimming algorithm for NGS sequence. First, PE-Trimmer removes technical sequences in paired-end reads based on the characteristics of low quality reads in NGS data. Second, PE-Trimmer determines the range of reads that need to be trimmed according to the quality score statistics histogram of reads in the library. To improve the accuracy of this algorithm, we design a light-weight and easy-to-explain scoring model to evaluate candidates in the pattern of trimming step. Finally, PE-Trimmer selects the appropriate trimming strategy to process the low quality reads based on the location determined by the scoring model. PE-Trimmer is able to locate and remove adapter residues from the paired-end reads. It is easily configurable and offers superior throughput in the multi-threaded mode. We test PE-Trimmer on five datasets, and compare it with the current five latest methods. The experimental results demonstrate that PE-Trimmer produces more superior results, compared with other trimmers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据