4.7 Article Proceedings Paper

TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain

期刊

BIOINFORMATICS
卷 35, 期 14, 页码 I200-I207

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btz376

关键词

-

资金

  1. Children's Hospital of Philadelphia
  2. National Key Research and Development Program of China [2018YFC0910504, 2017YFC0907503]

向作者/读者索取更多资源

Motivation Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity. Results We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy. Availability and implementation TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据