4.7 Article

A genetic algorithm for multivariate missing data imputation

期刊

INFORMATION SCIENCES
卷 619, 期 -, 页码 947-967

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.11.037

关键词

Missing data; Genetic algorithms; Multivariate missing data; Data imputation

向作者/读者索取更多资源

This paper presents a genetic algorithm for imputing multiple missing observations in multivariate data, using a new multi-objective function based on the Minkowski distance of the means, variances, covariances, and skewness. The algorithm is tested on continuous/discrete datasets and compared to the EM algorithm and auxiliary regressions, as well as benchmark datasets.
Some data mining, AI and data processing tasks might have data loss whose estimation/im-putation is an important problem to be solved. Genetic algorithms are efficient and flexible global optimization methods able to deal with both multiple missing observations and multiple features such as continuous/discrete/binary data which are often found in multi-variate databases unlike classical missing data estimation methods which only deal with univariate-continuous data. This paper presents a genetic algorithm to impute multiple missing observations in multivariate data which minimizes a new multi-objective (fitness) function based on the Minkowski distance of the means, variances, covariances and skew-ness between available/completed data. To do so, two sets of examples were tested: a con-tinuous/discrete dataset which is compared to both the EM algorithm and auxiliary regressions, and a comparison over seven benchmark datasets.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据