4.7 Article

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

期刊

MOLECULAR ECOLOGY RESOURCES
卷 21, 期 1, 页码 263-286

出版社

WILEY
DOI: 10.1111/1755-0998.13252

关键词

chromosome-level assembly; GC content; genome assembly; Hi-C; long reads; satellite repeat; transposable element

资金

  1. Carlsbergfondet [CF17-0248]
  2. Villum Fonden [15560]
  3. Science for Life Laboratory [2015-R14]
  4. National Geographic Society [8853-10]
  5. Vetenskapsradet [2016-05139, 621-2014-5113]
  6. Svenska Forskningsradet Formas [2017-01597]
  7. Vinnova [2017-01597] Funding Source: Vinnova
  8. Swedish Research Council [2016-05139] Funding Source: Swedish Research Council
  9. Formas [2017-01597] Funding Source: Formas

向作者/读者索取更多资源

Genome assemblies are being produced rapidly with advances in sequencing technologies, but challenges in assembling repeat-rich and GC-rich regions limit insights into genome evolution. The most efficient approach involves a multiplatform assembly utilizing long-read, linked-read, and proximity sequencing technologies to minimize gaps and optimize completeness of both coding and noncoding parts of nonmodel genomes.
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic dark matter) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据