4.6 Article

Learning the progression patterns of treatments using a probabilistic generative model

Journal

JOURNAL OF BIOMEDICAL INFORMATICS
Volume 137, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2022.104271

Keywords

Disease progression modeling; Electronic health records; Markov model; Probabilistic generative model; Unsupervised machine learning

Ask authors/readers for more resources

This paper presents a probabilistic generative model for disease modeling and patient treatment based on Electronic Health Records. The model aims to identify different subtypes of treatments for a given disease and discover their development and progression. It considers the hierarchical structure of latent variables to classify and segment the treatment sequences. The model's learning procedure is efficiently solved with the Expectation-Maximization algorithm based on dynamic programming. The evaluation includes recovering the generative model underlying synthetic data and assessing the model's ability to provide treatment classification and staging information in real-world data. The model can be used for classification, simulation, data augmentation, and missing data imputation.
Modeling a disease or the treatment of a patient has drawn much attention in recent years due to the vast amount of information that Electronic Health Records contain. This paper presents a probabilistic generative model of treatments that are described in terms of sequences of medical activities of variable length. The main objective is to identify distinct subtypes of treatments for a given disease, and discover their development and progression. To this end, the model considers that a sequence of actions has an associated hierarchical structure of latent variables that both classifies the sequences based on their evolution over time, and segments the sequences into different progression stages. The learning procedure of the model is performed with the Expectation-Maximization algorithm which considers the exponential number of configurations of the latent variables and is efficiently solved with a method based on dynamic programming. The evaluation of the model is twofold: first, we use synthetic data to demonstrate that the learning procedure allows the generative model underlying the data to be recovered; we then further assess the potential of our model to provide treatment classification and staging information in real-world data. Our model can be seen as a tool for classification, simulation, data augmentation and missing data imputation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available