4.4 Article

Will Big Data Close the Missing Heritability Gap?

Journal

GENETICS
Volume 207, Issue 3, Pages 1135-1145

Publisher

GENETICS SOCIETY AMERICA
DOI: 10.1534/genetics.117.300271

Keywords

prediction of complex traits; big data; genomic prediction; whole-genome regressions; UK Biobank; Bayesian; BGLR; GenPred; Shared Data Resources; Genomic Selection

Funding

  1. National Institutes of Health [R01 GM-099992, R01 GM-101219]
  2. Michigan State University
  3. Division Of Integrative Organismal Systems
  4. Direct For Biological Sciences [1444543] Funding Source: National Science Foundation

Ask authors/readers for more resources

Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity (e.g., number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing (n = 22,221) of 0.24 (95% C. I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available