4.8 Article

Assembly of a pan-genome from deep sequencing of 910 humans of African descent

Journal

NATURE GENETICS
Volume 51, Issue 1, Pages 30-+

Publisher

NATURE PUBLISHING GROUP
DOI: 10.1038/s41588-018-0273-y

Keywords

-

Funding

  1. NIH [R01-HL129239, R01-HG006677, R01HL104608]
  2. NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [R01HL129239, R01HL104608] Funding Source: NIH RePORTER
  3. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [R01HG006677] Funding Source: NIH RePORTER
  4. NATIONAL INSTITUTE OF ALLERGY AND INFECTIOUS DISEASES [R01AI132476] Funding Source: NIH RePORTER
  5. NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES [U54GM115428] Funding Source: NIH RePORTER

Ask authors/readers for more resources

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains similar to 10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available