4.5 Article

Machine learning in prediction of genetic risk of nonsyndromic oral clefts in the Brazilian population

Journal

CLINICAL ORAL INVESTIGATIONS
Volume 25, Issue 3, Pages 1273-1280

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s00784-020-03433-y

Keywords

Nonsyndromic oral cleft; Single nucleotide polymorphism; Machine learning; Genetic counseling; Brazilian population

Funding

  1. Sao Paulo Research Foundation (FAPESP), Sao Paulo, Brazil [2016/026670]
  2. National Postdoctoral Program of the Coordination of Training of Higher Education Graduate Foundation (PNPD/CAPES), Brasilia, Brazil
  3. Scientific Initiation Program of The National Council for Scientific and Technological Development (PIBIC/CNPq), Brasilia, Brazil

Ask authors/readers for more resources

Utilizing machine learning network, 13 key SNPs were identified to effectively predict the risk of NSCL +/- P in the Brazilian population. These SNPs showed high importance in both neural network and random forest methods, with associated genes involved in crucial biological processes such as tissue development, neural tube closure, and metabolism.
Objectives Genetic variants in multiple genes and loci have been associated with the risk of nonsyndromic cleft lip with or without cleft palate (NSCL +/- P). However, the estimation of risk remains challenge, because most of these variants are population-specific rendering the identification of the underlying genetic risk difficult. Herein we examined the use of machine learning network in previously reported single nucleotide polymorphisms (SNPs) to predict risk of NSCL +/- P in the Brazilian population. Materials and methods Random forest and neural network methods were applied in 72 SNPs in a case-control sample composed by 722 NSCL +/- P and 866 controls for discrimination of NSCL +/- P risk. SNP-SNP interactions and functional annotation biological processes associated with the identified NSCL +/- P risk genes were verified. Results Supervised random forest decision trees revealed high scores of importance for the SNPs rs11717284 and rs1875735 inFGF12, rs41268753 inGRHL3, rs2236225 inMTHFD1, rs2274976 inMTHFR, rs2235371 and rs642961 inIRF6, rs17085106 inRHPN2, rs28372960 inTCOF1, rs7078160 inVAX1, rs10762573 and rs2131960 inVCL, and rs227731 in 17q22, with an accuracy of 99% and an error rate of approximately 3% to predict the risk of NSCL +/- P. Those same 13 SNPs were considered the most important for the neural network to effectively predict NSCL +/- P risk, with an overall accuracy of 94%. Multivariate regression model revealed significant interactions among all SNPs, with an exception of those inFGF12andMTHFD1. The most significantly biological processes for selected genes were those involved in tissue and epithelium development; neural tube closure; and metabolism of methionine, folate, and homocysteine. Conclusions Our results provide novel clues for genetic mechanism studies of NSCL +/- P and point out for a machine learning model composed by 13 SNPs that is capable of predicting NSCL +/- P risk.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available