☆ 4.7 Article

A genetic algorithm for multivariate missing data imputation

INFORMATION SCIENCES (2023)

期刊

INFORMATION SCIENCES

卷 619, 期 -, 页码 947-967

出版社

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2022.11.037

关键词

Missing data; Genetic algorithms; Multivariate missing data; Data imputation

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper presents a genetic algorithm for imputing multiple missing observations in multivariate data, using a new multi-objective function based on the Minkowski distance of the means, variances, covariances, and skewness. The algorithm is tested on continuous/discrete datasets and compared to the EM algorithm and auxiliary regressions, as well as benchmark datasets.

Some data mining, AI and data processing tasks might have data loss whose estimation/im-putation is an important problem to be solved. Genetic algorithms are efficient and flexible global optimization methods able to deal with both multiple missing observations and multiple features such as continuous/discrete/binary data which are often found in multi-variate databases unlike classical missing data estimation methods which only deal with univariate-continuous data. This paper presents a genetic algorithm to impute multiple missing observations in multivariate data which minimizes a new multi-objective (fitness) function based on the Minkowski distance of the means, variances, covariances and skew-ness between available/completed data. To do so, two sets of examples were tested: a con-tinuous/discrete dataset which is compared to both the EM algorithm and auxiliary regressions, and a comparison over seven benchmark datasets.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

A genetic algorithm for multivariate missing data imputation

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A genetic algorithm for multivariate missing data imputation

期刊

INFORMATION SCIENCES

出版社

ELSEVIER SCIENCE INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文