4.8 Article

Resolving the complexity of the human genome using single-molecule sequencing

期刊

NATURE
卷 517, 期 7536, 页码 608-U163

出版社

NATURE PORTFOLIO
DOI: 10.1038/nature13907

关键词

-

资金

  1. US National Institutes of Health (NIH) [HG002385, HG007497]
  2. US National Institute of Neurological Disorders and Stroke [K99NS083627]

向作者/读者索取更多资源

The human genome is arguably the most complete mammalian reference assembly(1-3), yetmore than 160 euchromatic gaps remain(4-6) and aspects of its structural variation remain poorly understood ten years after its completion(7-9). To identify missing sequence and genetic variation, here we sequence and analyse ahaploid human genome (CHM1) using single-molecule, real-time DNA sequencing(10). We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome-78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3: 1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据