4.6 Article

SNP characteristics and validation success in genome wide association studies

Journal

HUMAN GENETICS
Volume 141, Issue 2, Pages 229-238

Publisher

SPRINGER
DOI: 10.1007/s00439-021-02407-8

Keywords

-

Funding

  1. National Institutes of Health [U19CA203654, U19CA203654S1, R01CA231141, P01 CA206980-01A1]
  2. Cancer Prevention and Research Institute of Texas Grant [RR170048]

Ask authors/readers for more resources

This study aimed to identify SNP features associated with validation success in GWAS. Results showed that the level of statistical significance and effect size in the discovery GWAS were the strongest predictors of validation success. The risk allele frequencies, location, and evolutionary conservation of SNPs also affected the validation success rate. Additionally, targeting the same ethnicity in the discovery and validation GWASs resulted in a higher validation success rate.
Genome wide association studies (GWASs) have identified tens of thousands of single nucleotide polymorphisms (SNPs) associated with human diseases and characteristics. A significant fraction of GWAS findings can be false positives. The gold standard for true positives is an independent validation. The goal of this study was to identify SNP features associated with validation success. Summary statistics from the Catalog of Published GWASs were used in the analysis. Since our goal was an analysis of reproducibility, we focused on the diseases/phenotypes targeted by at least 10 GWASs. GWASs were arranged in discovery-validation pairs based on the time of publication, with the discovery GWAS published before validation. We used four definitions of the validation success that differ by stringency. Associations of SNP features with validation success were consistent across the definitions. The strongest predictor of SNP validation was the level of statistical significance in the discovery GWAS. The magnitude of the effect size was associated with validation success in a non-linear manner. SNPs with risk allele frequencies in the range 30-70% showed a higher validation success rate compared to rarer or more common SNPs. Missense, 5'UTR, stop gained, and SNPs located in transcription factor binding sites had a higher validation success rate compared to intergenic, intronic and synonymous SNPs. There was a positive association between validation success and the level of evolutionary conservation of the sites. In addition, validation success was higher when discovery and validation GWASs targeted the same ethnicity. All predictors of validation success remained significant in a multivariate logistic regression model indicating their independent contribution. To conclude, we identified SNP features predicting validation success of GWAS hits. These features can be used to select SNPs for validation and downstream functional studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available