4.7 Article

Temporal characterization of Alzheimer's Disease with sequences of clinical records

Journal

EBIOMEDICINE
Volume 92, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.ebiom.2023.104629

Keywords

Alzheimer's Disease; Temporal representation mining; Electronic health records; Cohort identification

Ask authors/readers for more resources

This study developed computational models for identifying Alzheimer's Disease (AD) cohorts and compared the utility of AD diagnosis codes and temporal representations from electronic health records (EHRs) for characterizing AD cohorts. The models with sequential features improved AD classification by 3-16% over the use of diagnosis codes alone. These findings have important implications for accelerating AD research and precision drug development.
Background Alzheimer's Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls on an ageing global population. Real-world data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological research on AD. A precise characterization of AD cohorts is needed to address the noise abundant in RWD. Methods We conducted a retrospective cohort study to develop and test computational models for AD cohort iden-tification using clinical data from 8 Massachusetts healthcare systems. We mined temporal representations from EHR data using the transitive sequential pattern mining algorithm (tSPM) to train and validate our models. We then tested our models against a held-out test set from a review of medical records to adjudicate the presence of AD. We trained two classes of Machine Learning models, using Gradient Boosting Machine (GBM), to compare the utility of AD diagnosis records versus the tSPM temporal representations (comprising sequences of diagnosis and medication observations) from electronic medical records for characterizing AD cohorts. Findings In a group of 4985 patients, we identified 219 tSPM temporal representations (i.e., transitive sequences) of medical records for constructing the best classification models. The models with sequential features improved AD classification by a magnitude of 3-16 percent over the use of AD diagnosis codes alone. The computed cohort included 663 patients, 35 of whom had no record of AD. Six groups of tSPM sequences were identified for char-acterizing the AD cohorts. Interpretation We present sequential patterns of diagnosis and medication codes from electronic medical records, as digital markers of Alzheimer's Disease. Classification algorithms developed on sequential patterns can replace standard features from EHRs to enrich phenotype modelling. Funding National Institutes of Health: the National Institute on Aging (RF1AG074372) and the National Institute of Allergy and Infectious Diseases (R01AI165535). Copyright (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available