4.6 Article

Improving variant calling using population data and deep learning

Journal

BMC BIOINFORMATICS
Volume 24, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12859-023-05294-0

Keywords

-

Ask authors/readers for more resources

In this study, population-aware DeepVariant models were developed to improve the accuracy and recall of variant calling in single samples. By using allele frequencies from the 1000 Genomes Project, this model reduced variant calling errors and improved the precision of rare homozygous and pathogenic clinvar calls. The study also found that diverse reference panels were more accurate than population-specific panels, even when the sample ancestry matched the population.
Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we develop population-aware DeepVariant models with a new channel encoding allele frequencies from the 1000 Genomes Project. This model reduces variant calling errors, improving both precision and recall in single samples, and reduces rare homozygous and pathogenic clinvar calls cohort-wide. We assess the use of population-specific or diverse reference panels, finding the greatest accuracy with diverse panels, suggesting that large, diverse panels are preferable to individual populations, even when the population matches sample ancestry. Finally, we show that this benefit generalizes to samples with different ancestry from the training data even when the ancestry is also excluded from the reference panel.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available