4.7 Article

Using off-target data from whole-exome sequencing to improve genotyping accuracy, association analysis and polygenic risk prediction

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa084

Keywords

whole-exome sequencing; linkage disequilibrium; low-coverage off-target data; genome-wide association study; polygenic risk score

Funding

  1. Natural Science Foundation of China [NSFC 81973148]
  2. Biomedical Research Council [BMRC 03/1/27/18/216]
  3. National Medical Research Council [0838/2004]
  4. National Research Foundation [BMRC 05/1/21/19/425, 11/1/21/19/678]
  5. Ministry of Health, Singapore
  6. Agency for Science, Technology and Research, Singapore
  7. Merck Sharp & Dohme Corp., Whitehouse Station, NJ, USA

Ask authors/readers for more resources

WEScall is a genotype calling pipeline that reduces genotype discordance rates in WES analyses and performs well in off-target data. By analyzing WES data using WEScall, significant loci related to metabolic traits can be identified and multi-gene risk prediction accuracy can be improved.
Whole-exome sequencing (WES) has been widely used to study the role of protein-coding variants in genetic diseases. Non-coding regions, typically covered by sparse off-target data, are often discarded by conventional WES analyses. Here, we develop a genotype calling pipeline named WEScall to analyse both target and off-target data. We leverage linkage disequilibrium shared within study samples and from an external reference panel to improve genotyping accuracy. In an application to WES of 2527 Chinese and Malays, WEScall can reduce the genotype discordance rate from 0.26% (SE= 6.4 x 10(-6)) to 0.08% (SE = 3.6 x 10(-6)) across 1.1 million single nucleotide polymorphisms (SNPs) in the deeply sequenced target regions. Furthermore, we obtain genotypes at 0.70% (SE = 3.0 x 10(-6)) discordance rate across 5.2 million off-target SNPs, which had similar to 1.2x mean sequencing depth. Using this dataset, we perform genome-wide association studies of 10 metabolic traits. Despite of our small sample size, we identify 10 loci at genome-wide significance (P < 5 x 10(-8)), including eight well-established loci. The two novel loci, both associated with glycated haemoglobin levels, are GPATCH8-SLC4A1 (rs369762319, P = 2.56 x 10(-12)) and ROR2 (rs1201042, P = 3.24 x 10(-8)). Finally, using summary statistics from UK Biobank and Biobank Japan, we show that polygenic risk prediction can be significantly improved for six out of nine traits by incorporating off-target data (P < 0.01). These results demonstrate WEScall as a useful tool to facilitate WES studies with decent amounts of off-target data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available