4.7 Article

Combining attention with spectrum to handle missing values on time series data without imputation

期刊

INFORMATION SCIENCES
卷 609, 期 -, 页码 1271-1287

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.07.124

关键词

Missing value; Incomplete data; Attention neural network; Deep learning; Electronic health record; Imputation; Missing value; Incomplete data; Attention neural network; Deep learning; Electronic health record; Imputation

资金

  1. Ministry of Science and Technology, Taiwan [MOST109-2634-F002-031, MOST109-2634-F002-041, MOST 109-2634-F-002-029]

向作者/读者索取更多资源

In the development of predictive models, handling missing data is a critical issue. Traditional approaches require a two-step analysis to analyze missing patterns, select variables, impute missing values, and train models. However, these models have limitations in handling high missing rates and variable changes. To address this problem, the researchers propose an attention-based neural network combined with a novel real number representation. This algorithm requires less manual variable selection and can overlook missing data, eliminating the need for imputation. The results show that the proposed algorithm outperforms current approaches in predicting prolonged length of stay in the ICU.
In the development of predictive models, the problem of missing data is a critical issue that traditionally requires a two-step analysis. Data scientists analyze the patterns of missing val-ues, select variables, impute missing values on the basis of domain knowledge, and then train a model. Models typically have their input sizes hardcoded, and have limitations in handling data with high missing rates or changes in available variables. We propose an attention -based neural network combined with a novel real number representation, which requires lit-tle work on manually selecting variables, and in which missing data can be overlooked, mak-ing imputation unnecessary. In this proposed model, data analysis can be one step, omitting the first step of imputing missing values. The study included data on 32,709 intensive care unit (ICU) admissions and 60 healthcare variables from the Medical Information Mart for Intensive Care (MIMIC)-IV. The proposed algorithm yielded an area under the receiver oper-ating characteristic curve (AUC) of 0.842 (95% CIs: 0.828-0.856) when predicting prolonged length of stay in the ICU, outperforming current approaches using imputation methods. The proposed algorithm can be applied to a range of problems in data science, as it addresses the issue of incomplete data with automatic variable selection. (c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据