4.7 Article

EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 236, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2021.107734

Keywords

Missing data; Data imputation; Evolutionary algorithms

Ask authors/readers for more resources

Missing data is a significant problem in knowledge extraction, and this paper proposes a new approach that uses evolutionary algorithms to evaluate the usefulness of data imputation for prediction model performance. The method selects the best subset of incomplete features that can enhance the learning process and maximize the prediction power. The proposed method outperforms traditional imputation methods and other evolutionary-based imputation methods in experiments.
Missing data is a considerable problem in knowledge extraction where the completeness and the quality of the data play a major role in data analysis. In many applications, ignoring the records with missing values may adversely affect the prediction process and creates a significant bias in the resulting data. Therefore, Missing Data Imputation (MDI) has become mandatory to tackle the negative consequences of the presence of missing data. However, different features show different behaviours to data imputation, as the imputation of some features can enhance the learning process while others may lead to worse results according to the feature properties. This paper proposes the use of evolutionary algorithms to evaluate the usefulness of the imputation for each feature on the performance of the prediction model, in order to select the best subset of incomplete features that can enhance the learning process and maximize the prediction power of the model after it has been handled properly. This paper proposes a new approach for handling missing values while performing feature selection simultaneously to enhance the model's learning performance and reduce the negative consequences of imputation. The performance of the proposed method was evaluated using 10 bench-marking datasets under 10-folds cross validation test. The results were compared with five classical imputation methods (mean, median, multiple imputation, expectation maximization, and K-nearest neighbours). The proposed methodology significantly outperformed other methods in terms of accuracy, sensitivity, specificity, geometric means, and the area under the curve. Moreover, the effectiveness of the proposed method was compared against three recent evolutionary based imputation methods, where the proposed methodology outperformed other methods in terms of accuracy in 75% of the datasets. (C) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available