4.8 Article

Long-read sequence and assembly of segmental duplications

Journal

NATURE METHODS
Volume 16, Issue 1, Pages 88-+

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/s41592-018-0236-3

Keywords

-

Funding

  1. US National Institutes of Health (NIH) [HG002385, HG007635, HG003079]
  2. National Library of Medicine (NLM) Big Data Training Grant for Genomics and Neuroscience [5T32LM012419-04]
  3. National Human Genome Research Institute (NHGRI) training grant [5T32HG000035-23]
  4. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [U54HG003079, T32HG000035, R01HG002385, U41HG007635, R01HG010169] Funding Source: NIH RePORTER
  5. NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES [T32GM007266] Funding Source: NIH RePORTER
  6. NATIONAL LIBRARY OF MEDICINE [T32LM012419] Funding Source: NIH RePORTER

Ask authors/readers for more resources

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (< 99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (> 99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available