4.7 Article

High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads

期刊

GENOMICS PROTEOMICS & BIOINFORMATICS
卷 20, 期 1, 页码 4-13

出版社

ELSEVIER
DOI: 10.1016/j.gpb.2021.08.003

关键词

Centromere architecture; CENH3; Bacterial artificial chromosome; Telomere-to-telomere; Model plant

资金

  1. National Natural Science Foundation of China [62172325, 32070663]
  2. China Postdoctoral Science Foundation [2020M673420]
  3. Fundamental Research Funds for the Central Universities, China
  4. World-Class Universities (Disciplines)
  5. Characteristic Development Guidance Funds for the Central Universities, China

向作者/读者索取更多资源

This study successfully assembled a high-quality and almost complete genome of Arabidopsis thaliana using multiple advanced sequencing technologies. The new genome assembly contains more information compared to the previous reference genome, providing valuable insights into the global pattern of centromeric polymorphisms and the genetic and epigenetic features in plants.
Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据