4.7 Article

Benchmarking of long-read sequencing, assemblers and polishers for yeast genome

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac146

关键词

de novo assembly; long-read sequencing; benchmarking; yeast; data depth; genome analysis

资金

  1. National Key Research and Development Program of China [2018YFB1501401, 2021YFC2100600, 2019YFA0905400]
  2. National Natural Science Foundation of China [21978167, 31970026, 32070679]
  3. Open Project Funding of the State Key Laboratory of Biocatalysis and Enzyme Engineering [SKLBEE2018016]

向作者/读者索取更多资源

The quality of genome construction depends on the sequencing platform, depth, and tools used, with Flye being superior for low-depth datasets according to C_score evaluation.
Background The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production. Methods We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains. Results The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据