4.7 Article

Genetic Analysis of Coronary Artery Disease Using Tree-Based Automated Machine Learning Informed By Biology-Based Feature Selection

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2021.3099068

Keywords

Bioinformatics; Pipelines; Genomics; Drugs; Biology; Solid modeling; Frequency selective surfaces; Automated machine learning; coronary artery disease; genome-wide association studies; SHAP values

Funding

  1. NIH [LM010098]

Ask authors/readers for more resources

Machine Learning approaches, such as TPOT, are being increasingly used in biomedical applications. This study aimed to assess the suitability of TPOT in genomics and identify SNP combinations associated with coronary artery disease (CAD). The results showed a promising approach towards precision medicine.
Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the U.K. Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available