☆ 4.7 Article

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

KNOWLEDGE-BASED SYSTEMS (2022)

Journal

KNOWLEDGE-BASED SYSTEMS

Volume 236, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.knosys.2021.107734

Keywords

Missing data; Data imputation; Evolutionary algorithms

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Missing data is a significant problem in knowledge extraction, and this paper proposes a new approach that uses evolutionary algorithms to evaluate the usefulness of data imputation for prediction model performance. The method selects the best subset of incomplete features that can enhance the learning process and maximize the prediction power. The proposed method outperforms traditional imputation methods and other evolutionary-based imputation methods in experiments.

Missing data is a considerable problem in knowledge extraction where the completeness and the quality of the data play a major role in data analysis. In many applications, ignoring the records with missing values may adversely affect the prediction process and creates a significant bias in the resulting data. Therefore, Missing Data Imputation (MDI) has become mandatory to tackle the negative consequences of the presence of missing data. However, different features show different behaviours to data imputation, as the imputation of some features can enhance the learning process while others may lead to worse results according to the feature properties. This paper proposes the use of evolutionary algorithms to evaluate the usefulness of the imputation for each feature on the performance of the prediction model, in order to select the best subset of incomplete features that can enhance the learning process and maximize the prediction power of the model after it has been handled properly. This paper proposes a new approach for handling missing values while performing feature selection simultaneously to enhance the model's learning performance and reduce the negative consequences of imputation. The performance of the proposed method was evaluated using 10 bench-marking datasets under 10-folds cross validation test. The results were compared with five classical imputation methods (mean, median, multiple imputation, expectation maximization, and K-nearest neighbours). The proposed methodology significantly outperformed other methods in terms of accuracy, sensitivity, specificity, geometric means, and the area under the curve. Moreover, the effectiveness of the proposed method was compared against three recent evolutionary based imputation methods, where the proposed methodology outperformed other methods in terms of accuracy in 75% of the datasets. (C) 2021 Elsevier B.V. All rights reserved.

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

Journal

KNOWLEDGE-BASED SYSTEMS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

Journal

KNOWLEDGE-BASED SYSTEMS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper