☆ 4.7 Article

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

KNOWLEDGE-BASED SYSTEMS (2022)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 236, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2021.107734

关键词

Missing data; Data imputation; Evolutionary algorithms

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Missing data is a significant problem in knowledge extraction, and this paper proposes a new approach that uses evolutionary algorithms to evaluate the usefulness of data imputation for prediction model performance. The method selects the best subset of incomplete features that can enhance the learning process and maximize the prediction power. The proposed method outperforms traditional imputation methods and other evolutionary-based imputation methods in experiments.

Missing data is a considerable problem in knowledge extraction where the completeness and the quality of the data play a major role in data analysis. In many applications, ignoring the records with missing values may adversely affect the prediction process and creates a significant bias in the resulting data. Therefore, Missing Data Imputation (MDI) has become mandatory to tackle the negative consequences of the presence of missing data. However, different features show different behaviours to data imputation, as the imputation of some features can enhance the learning process while others may lead to worse results according to the feature properties. This paper proposes the use of evolutionary algorithms to evaluate the usefulness of the imputation for each feature on the performance of the prediction model, in order to select the best subset of incomplete features that can enhance the learning process and maximize the prediction power of the model after it has been handled properly. This paper proposes a new approach for handling missing values while performing feature selection simultaneously to enhance the model's learning performance and reduce the negative consequences of imputation. The performance of the proposed method was evaluated using 10 bench-marking datasets under 10-folds cross validation test. The results were compared with five classical imputation methods (mean, median, multiple imputation, expectation maximization, and K-nearest neighbours). The proposed methodology significantly outperformed other methods in terms of accuracy, sensitivity, specificity, geometric means, and the area under the curve. Moreover, the effectiveness of the proposed method was compared against three recent evolutionary based imputation methods, where the proposed methodology outperformed other methods in terms of accuracy in 75% of the datasets. (C) 2021 Elsevier B.V. All rights reserved.

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文