4.8 Article

An expanded sequence context model broadly explains variability in polymorphism levels across the human genome

Journal

NATURE GENETICS
Volume 48, Issue 4, Pages 349-+

Publisher

NATURE PORTFOLIO
DOI: 10.1038/ng.3511

Keywords

-

Funding

  1. Alfred P. Sloan Foundation [BR2012-087]
  2. American Heart Association [13SDG14330006]
  3. W.W. Smith Charitable Trust [H1201]
  4. US National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Disorders [R01DK101478]

Ask authors/readers for more resources

The rate of single-nucleotide polymorphism varies substantially across the human genome and fundamentally influences evolution and incidence of genetic disease. Previous studies have only considered the immediately flanking nucleotides around a polymorphic site-the site's trinucleotide sequence context-to study polymorphism levels across the genome. Moreover, the impact of larger sequence contexts has not been fully clarified, even though context substantially influences rates of polymorphism. Using a new statistical framework and data from the 1000 Genomes Project, we demonstrate that a heptanucleotide context explains >81% of variability in substitution probabilities, highlighting new mutation-promoting motifs at ApT dinucleotide, CAAT and TACG sequences. Our approach also identifies previously undocumented variability in C-to-T substitutions at CpG sites, which is not immediately explained by differential methylation intensity. Using our model, we present informative substitution intolerance scores for genes and a new intolerance score for amino acids, and we demonstrate clinical use of the model in neuropsychiatric diseases.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available