4.7 Article

Unsupervised Machine Learning to Identify High Likelihood of Dementia in Population-Based Surveys: Development and Validation Study

Journal

JOURNAL OF MEDICAL INTERNET RESEARCH
Volume 20, Issue 7, Pages -

Publisher

JMIR PUBLICATIONS, INC
DOI: 10.2196/10493

Keywords

dementia; cognition disorders; health surveys; electronic health records; diagnosis; unsupervised machine learning; cluster analysis; data mining

Funding

  1. Global Brain Health Institute
  2. National Institute on Aging [NIA U01AG009740]
  3. European Commission through FP5 [QLK6-CT-2001-00360]
  4. European Commission through FP6 [RII-CT-2006-062193, CIT5-CT-2005-028857]
  5. European Commission through FP7 [211909, 227822, 261982]
  6. National Institute of Aging [R01 AG030153, RC2 AG036691, R03 AG043052]
  7. National Institute on Aging
  8. [K24 AG031155]
  9. [ANR-10-LABX-0087 IEC]
  10. [ANR-10-IDEX-0001-02 PSL]

Ask authors/readers for more resources

Background: Dementia is increasing in prevalence worldwide, yet frequently remains undiagnosed, especially in low-and middle-income countries. Population-based surveys represent an underinvestigated source to identify individuals at risk of dementia. Objective: The aim is to identify participants with high likelihood of dementia in population-based surveys without the need of the clinical diagnosis of dementia in a subsample. Methods: Unsupervised machine learning classification (hierarchical clustering on principal components) was developed in the Health and Retirement Study (HRS; 2002-2003, N=18,165 individuals) and validated in the Survey of Health, Ageing and Retirement in Europe (SHARE; 2010-2012, N=58,202 individuals). Results: Unsupervised machine learning classification identified three clusters in HRS: cluster 1 (n=12,231) without any functional or motor limitations, cluster 2 (N=4841) with walking/climbing limitations, and cluster 3 (N=1093) with both functional and walking/climbing limitations. Comparison of cluster 3 with previously published predicted probabilities of dementia in HRS showed that it identified high likelihood of dementia (probability of dementia >0.95; area under the curve [AUC]=0.91). Removing either cognitive or both cognitive and behavioral measures did not impede accurate classification (AUC=0.91 and AUC=0.90, respectively). Three clusters with similar profiles were identified in SHARE (cluster 1: n=40,223; cluster 2: n=15,644; cluster 3: n=2335). Survival rate of participants from cluster 3 reached 39.2% (n=665 deceased) in HRS and 62.2% (n=811 deceased) in SHARE after a 3.9-year follow-up. Surviving participants from cluster 3 in both cohorts worsened their functional and mobility performance over the same period. Conclusions: Unsupervised machine learning identifies high likelihood of dementia in population-based surveys, even without cognitive and behavioral measures and without the need of clinical diagnosis of dementia in a subsample of the population. This method could be used to tackle the global challenge of dementia.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available