期刊
INFORMATION SCIENCES
卷 619, 期 -, 页码 947-967出版社
ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.11.037
关键词
Missing data; Genetic algorithms; Multivariate missing data; Data imputation
This paper presents a genetic algorithm for imputing multiple missing observations in multivariate data, using a new multi-objective function based on the Minkowski distance of the means, variances, covariances, and skewness. The algorithm is tested on continuous/discrete datasets and compared to the EM algorithm and auxiliary regressions, as well as benchmark datasets.
Some data mining, AI and data processing tasks might have data loss whose estimation/im-putation is an important problem to be solved. Genetic algorithms are efficient and flexible global optimization methods able to deal with both multiple missing observations and multiple features such as continuous/discrete/binary data which are often found in multi-variate databases unlike classical missing data estimation methods which only deal with univariate-continuous data. This paper presents a genetic algorithm to impute multiple missing observations in multivariate data which minimizes a new multi-objective (fitness) function based on the Minkowski distance of the means, variances, covariances and skew-ness between available/completed data. To do so, two sets of examples were tested: a con-tinuous/discrete dataset which is compared to both the EM algorithm and auxiliary regressions, and a comparison over seven benchmark datasets.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据