4.1 Article

Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes

Publisher

WILEY
DOI: 10.1002/ajmg.b.32638

Keywords

artificial intelligence; diagnostic; genomic; prediction; psychosis

Funding

  1. Montreal Children's Hospital Foundation Funding Source: Medline

Ask authors/readers for more resources

Our hypothesis is that machine learning (ML) analysis of whole exome sequencing (WES) data can be used to identify individuals at high risk for schizophrenia (SCZ). This study applies ML to WES data from 2,545 individuals with SCZ and 2,545 unaffected individuals, accessed via the database of genotypes and phenotypes (dbGaP). Single nucleotide variants and small insertions and deletions were annotated by ANNOVAR using the reference genome hg19/GRCh37. Rare (predicted functional) variants with a minor allele frequency <= 1% and genotype quality >= 90 including missense, frameshift, stop gain, stop loss, intronic, and exonic splicing variants were selected. A file containing all cases and controls, the names of genes with variants meeting our criteria, and the number of variants per gene for each individual, was used for ML analysis. The supervised machine-learning algorithm used the patterns of variants observed in the different genes to determine which subset of genes can best predict that an individual is affected. Seventy percent of the data was used to train the algorithm and the remaining 30% of data (n = 1,526) was used to evaluate its efficiency. The supervised ML algorithm, gradient boosted trees with regularization (eXtreme Gradient Boosting implementation) was the best performing algorithm yielding promising results (accuracy: 85.7%, specificity: 86.6%, sensitivity: 84.9%, area under the receiver-operator characteristic curve: 0.95). The top 50 features (genes) of the algorithm were analyzed using bioinformatics resources for new insights about the pathophysiology of SCZ. This manuscript presents a novel predictor which could potentially enable studies exploring disease-modifying intervention in the early stages of the disease.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available