☆ 4.6 Article

Predicting Alzheimer's Disease from Spoken and Written Language Using Fusion-Based Stacked Generalization

JOURNAL OF BIOMEDICAL INFORMATICS (2021)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 118, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2021.103803

关键词

Machine learning; Feature selection; Information fusion; Ensemble classifier; Cognitive decline; Clinical diagnosis; Neurolinguistics

类别

Computer Science, Interdisciplinary Applications Medical Informatics

资金

NICHD
RIDIR
NSF
Carnegie Mellon University
SBE NIDCD

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study emphasizes the importance of automating the diagnosis of AD using language deficiency, developing multiple heterogeneous stacked fusion models to improve generalizability and robustness of AD diagnostic ML models. The models trained on two different datasets achieved high AUC, accuracy, and F1 score values. The suggestion is to replace traditional screening tests with these models for fully automated remote diagnosis.

The importance of automating the diagnosis of Alzheimer disease (AD) towards facilitating its early prediction has long been emphasized, hampered in part by lack of empirical support. Given the evident association of AD with age and the increasing aging population owing to the general well-being of individuals, there have been unprecedented estimated economic complications. Consequently, many recent studies have attempted to employ the language deficiency caused by cognitive decline in automating the diagnostic task via training machine learning (ML) algorithms with linguistic patterns and deficits. In this study, we aim to develop multiple heterogeneous stacked fusion models that harness the advantages of several base learning algorithms to improve the overall generalizability and robustness of AD diagnostic ML models, where we parallelly utilized two different written and spoken-based datasets to train our stacked fusion models. Further, we examined the effect of linking these two datasets to develop a hybrid stacked fusion model that can predict AD from written and spoken languages. Our feature spaces involved two widely used linguistic patterns: lexicosyntactics and character n-gram spaces. We firstly investigated lexicosyntactics of AD alongside healthy controls (HC), where we explored a few new lexicosyntactic features, then optimized the lexicosyntactic feature space by proposing a correlation feature selection technique that eliminates features based on their feature-feature inter-correlations and feature-target correlations according to a certain threshold. Our stacked fusion models establish benchmarks on both datasets with AUC of 98.1% and 99.47% for the spoken and written-based datasets, respectively, and corresponding accuracy and F1 score values around 95% on spoken-based dataset and around 97% on the written-based dataset. Likewise, the hybrid stacked fusion model on linked data presents an optimal performance with 99.2% AUC as well as accuracy and F1 score falling around 97%. In view of the achieved performance and enhanced generalizability of such fusion models over single classifiers, this study suggests replacing the initial traditional screening test with such models that can be embedded into an online format for a fully automated remote diagnosis.

Predicting Alzheimer's Disease from Spoken and Written Language Using Fusion-Based Stacked Generalization

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predicting Alzheimer's Disease from Spoken and Written Language Using Fusion-Based Stacked Generalization

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文