4.7 Article

LRScaf: improving draft genomes using long noisy reads

期刊

BMC GENOMICS
卷 20, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12864-019-6337-2

关键词

LRScaf; Scaffolding algorithm; Third generation sequencing technologies; PacBio; Nanopore

资金

  1. Dapeng New District Special Fund for Industrial Development [KY20160204]
  2. National Key Research and Development Program of China [2016YFC1200600]
  3. National Natural Science Foundation of China [31571353]
  4. Fundamental Research Funds for Central Non-profit Scientific Institution [Y2016PT54]
  5. Fund of Key Laboratory of Shenzhen [ZDSYS20141118170111640]
  6. Agricultural Science and Technology Innovation Program
  7. Shenzhen Science and Technology Research Funding [JSGG20160429104101251]
  8. Key Forestry Public Welfare Project [201504105]
  9. Agricultural Science and Technology Innovation Program Cooperation and Innovation Mission [CAAS-XTCX2016]

向作者/读者索取更多资源

Background: The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. Results: We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). Conclusions: The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据