4.7 Article

Transcriptomics and machine learning to advance schizophrenia genetics: A case-control study using post-mortem brain data

Journal

Publisher

ELSEVIER IRELAND LTD
DOI: 10.1016/j.cmpb.2021.106590

Keywords

Schizophrenia; Transcriptomics; Machine learning; Bioinformatics; Post-mortem

Funding

  1. McGill University Health Centre Research Institute
  2. Canada First Research Excellence Fund (McGill University Healthy Brains for Healthy Lives Initiative)
  3. FRQS Chercheur Boursier Clinicien salary award

Ask authors/readers for more resources

This study evaluates the performance of machine learning in classifying schizophrenia cases and controls based on gene expression microarray data. The results show above-chance performance in classification and suggest that ML analysis of gene expressions can contribute to our understanding of schizophrenia's pathophysiology and aid in identifying novel treatments.
Background and Objective: Alterations of the expression of a variety of genes have been reported in patients with schizophrenia (SCZ). Moreover, machine learning (ML) analysis of gene expression microarray data has shown promising preliminary results in the study of SCZ. Our objective was to evaluate the performance of ML in classifying SCZ cases and controls based on gene expression microarray data from the dorsolateral prefrontal cortex. Methods: We apply a state-of-the-art ML algorithm (XGBoost) to train and evaluate a classification model using 201 SCZ cases and 278 controls. We utilized 10-fold cross-validation for model selection, and a held-out testing set to evaluate the model. The performance metric utilizes to evaluate classification performance was the area under the receiver-operator characteristics curve (AUC). Results: We report an average AUC on 10-fold cross-validation of 0.76 and an AUC of 0.76 on testing data, not used during training. Analysis of the rolling balanced classification accuracy from high to low prediction confidence levels showed that the most certain subset of predictions ranged between 80-90%. The ML model utilized 182 gene expression probes. Further improvement to classification performance was observed when applying an automated ML strategy on the 182 features, which achieved an AUC of 0.79 on the same testing data. We found literature evidence linking all of the top ten ML ranked genes to SCZ. Furthermore, we leveraged information from the full set of microarray gene expressions available via univariate differential gene expression analysis. We then prioritized differentially expressed gene sets using the piano gene set analysis package. We augmented the ranking of the prioritized gene sets with genes from the complex multivariate ML model using hypergeometric tests to identify more robust gene sets. We identified two significant Gene Ontology molecular function gene sets: oxidoreductase activity, acting on the CH-NH2 group of donors and integrin binding. Lastly, we present candidate treatments for SCZ based on findings from our study Conclusions: Overall, we observed above-chance performance from ML classification of SCZ cases and controls based on brain gene expression microarray data, and found that ML analysis of gene expressions could further our understanding of the pathophysiology of SCZ and help identify novel treatments. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available