4.7 Article

A dynamic ensemble approach to robust classification in the presence of missing data

期刊

MACHINE LEARNING
卷 102, 期 3, 页码 443-463

出版社

SPRINGER
DOI: 10.1007/s10994-015-5530-z

关键词

Missing data; Ensemble methods; Hemodynamic instability

向作者/读者索取更多资源

Many real-world datasets suffer from missing or incomplete data. In the healthcare setting, for example, certain patient measurement parameters, such as vitals and/or lab values, may be missing due to insufficient monitoring. When present, however, these features could be highly discriminative in predicting aspects of patient state. Therefore, it is desirable to incorporate these sparsely measured features into a predictive model. Training predictive algorithms on such datasets is complicated by the missing data. Overcoming this problem is usually achieved by first estimating values for the missing data, which is referred to as data imputation. Without strong prior knowledge about the relationship between features though, it is common to fill in missing values with their respective population mean or median. The accuracy of this approach is limited, however, and may simply inject noise into the data. We propose a two-stage machine learning algorithm that learns a dynamic classifier ensemble from an incomplete dataset without data imputation. The algorithm is very simple to implement and applicable across a wide range of problems. Our method first employs a variant of AdaBoost to learn a set of low-dimensional classifiers, each of which abstains from predicting if its dependent feature(s) are missing. Our novel contribution is the secondary dynamic ensemble learning stage in which the low-dimensional classifiers are combined using a dynamic weighting that depends on the pattern of measured features in the present input data. This allows the model to be resilient to missing data by adjusting the strength of certain classifiers to account for missing features. We apply our algorithm to early detection of hemodynamic instability in ICU patients. Providing an effective risk score of hemodynamic instability has the potential to give the clinician sufficient time to intervene, thereby reducing the chance of organ damage due to insufficient blood perfusion. We compare the results of our algorithm to other common missing data approaches, including mean imputation and multiple imputation methods, and discuss the advantages of the approach given the constraints of the application domain (e.g., high specificity to combat hospital alarm fatigue).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据