4.7 Article

DENTIST-using long reads for closing assembly gaps at high accuracy

期刊

GIGASCIENCE
卷 11, 期 -, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giab100

关键词

genome assembly; long sequencing reads; assembly gaps

资金

  1. Max Planck Society
  2. Federal Ministry of Education and Research [01IS18026C]
  3. LOEWE-Centre for Translational Biodiversity Genomics (TBG) - Hessen State Ministry of Higher Education, Research and the Arts (HMWK)

向作者/读者索取更多资源

In this study, we present DENTIST, a sensitive, highly accurate, and automated pipeline method for closing gaps in short-read assemblies with long error-prone reads. Through tests on real genomic data, we demonstrate that DENTIST achieves higher accuracy and similar sensitivity compared to previous methods.
Background Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read-based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence. Findings Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity. Conclusion DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据