4.7 Article

Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI

期刊

GIGASCIENCE
卷 11, 期 -, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giac097

关键词

DNA methylation; machine learning; data harmonization; explainable artificial intelligence

资金

  1. Ministry of Science and Higher Education of the Russian Federation
  2. Grant for Major Research Projects in Priority Areas of Scientific and Technological Development [075-15-2020-808]

向作者/读者索取更多资源

The study proposes a comprehensive approach for classifying controls and patients using combined DNA methylation datasets. The proposed method includes data harmonization, machine learning classification model construction, dimensionality reduction, imputation of missing values, and explanation of model predictions. The results show that harmonization improves classification accuracy, tree ensembles achieve the best accuracy, dimensionality reduction does not affect accuracy, and the best imputation methods perform as well as the original data.
Background: DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. Results: We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. Conclusions: We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based onwhole-blood DNAmethylation data using Parkinson's disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis ofmany different datasets, imputemissing values, and build classificationmodels of small dimensionality.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据