☆ 4.7 Article

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction

ARTIFICIAL INTELLIGENCE IN MEDICINE (2018)

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE

Volume 85, Issue -, Pages 43-49

Publisher

ELSEVIER

DOI: 10.1016/j.artmed.2017.09.005

Keywords

Type 2 diabetes; Random Forest; Feature learning; Predictive model; Gini importance

Funding

European Union's Horizon research and innovation programme [689810]
University of Girona [MPCUdG2016]
Spanish MINECO [DPI2013-47450-C21-R]
H2020 Societal Challenges Programme [689810] Funding Source: H2020 Societal Challenges Programme

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Objective: The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction. Methods: We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF. Results: Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes. Conclusions: The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption. (C) 2017 Elsevier B.V. All rights reserved.

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper