4.6 Article

Synthetic Data as a Proxy for Real-World Electronic Health Records in the Patient Length of Stay Prediction

Journal

SUSTAINABILITY
Volume 15, Issue 18, Pages -

Publisher

MDPI
DOI: 10.3390/su151813690

Keywords

generative artificial intelligence; synthetic tabular data; healthcare industry; synthetic electronic health records (EHR); patient length of stay (LOS)

Ask authors/readers for more resources

This paper examines the application of generative adversarial networks (GANs) in generating synthetic tabular electronic health records (EHR) data for predicting patient length of stay (LOS) in the healthcare industry. By comparing different GAN models, it is found that the Conditional Tabular GAN (CTGAN) performs better in this use case. However, there is still room for improvement when applying state-of-the-art GAN models to clinical healthcare data.
While generative artificial intelligence has gained popularity, e.g., for the creation of images, it can also be used for the creation of synthetic tabular data. This bears great potential, especially for the healthcare industry, where data are often scarce and underlie privacy restrictions. For instance, the creation of synthetic electronic health records (EHR) promises to improve the usage of machine learning algorithms, which usually work with large amounts of data. This also applies for the prediction of the patient length of stay (LOS), a key measure for hospitals. Thereby, the LOS represents one of the core tools for decision makers to plan the allocation of resources. Thus, this paper aims to add to the still-young research concerning the application of generative adversarial nets (GAN) on tabular EHR. It does that with the intention to leverage the advantages of synthetic data for the prediction of the LOS in order to contribute to the efficiency-enhancing and cost-saving aspirations of hospitals and insurance companies. Therefore, the applicability of synthetic data that is generated using GANs as a proxy for scarce real-world EHR for the patient LOS multi-class classification task is examined. In this context, the Conditional Tabular GAN (CTGAN) and the Copula GAN are selected as the underlying models as they are state-of-the-art GAN architectures designed for generating synthetic tabular data. The CTGAN is found to be the superior model for the underlying use case. Nevertheless, the paper shows that there is still room for improvement when applying state-of-the-art GAN architectures to clinical healthcare data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available