4.8 Article

Resolving the complexity of the human genome using single-molecule sequencing

Journal

NATURE
Volume 517, Issue 7536, Pages 608-U163

Publisher

NATURE PORTFOLIO
DOI: 10.1038/nature13907

Keywords

-

Funding

  1. US National Institutes of Health (NIH) [HG002385, HG007497]
  2. US National Institute of Neurological Disorders and Stroke [K99NS083627]

Ask authors/readers for more resources

The human genome is arguably the most complete mammalian reference assembly(1-3), yetmore than 160 euchromatic gaps remain(4-6) and aspects of its structural variation remain poorly understood ten years after its completion(7-9). To identify missing sequence and genetic variation, here we sequence and analyse ahaploid human genome (CHM1) using single-molecule, real-time DNA sequencing(10). We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome-78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3: 1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available