4.5 Article

Machine learning models for accurate prioritization of variants of uncertain significance

期刊

HUMAN MUTATION
卷 43, 期 4, 页码 449-460

出版社

WILEY-HINDAWI
DOI: 10.1002/humu.24339

关键词

genetic diagnosis; machine learning; pathogenicity prediction; variant interpretation; variants of uncertain significance

资金

  1. Universidad de los Andes

向作者/读者索取更多资源

This manuscript compares three machine learning methods for classifying VUS as Pathogenic or No pathogenic and creates an open-source software tool for prioritizing VUS using a Random Forest model, thus improving the process of genetic diagnosis.
The growing use of next-generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of variants of uncertain significance (VUS). In this manuscript, we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron. To train the models, we extracted high-quality variants from ClinVar that were previously classified as VUS. For each variant, we retrieved nine conservation scores, the loss-of-function tool, and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross-validation with a grid search. The three models were tested on a nonoverlapping set of variants that had been classified as VUS over the last 3 years, but had been reclassified in August 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF-based model yielded the best performance across different variant types and was used to create VusPrize, an open-source software tool for prioritization of VUS. We believe that our model can improve the process of genetic diagnosis in research and clinical settings.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据