4.7 Article

A bi-objective k-nearest-neighbors-based imputation method for multilevel data

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 204, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.117298

关键词

Multilevel data; Imputation; k-nearest neighbors

资金

  1. AUFF NOVA grant [AUFF-E-2019-9-3]

向作者/读者索取更多资源

This study proposes a bi-objective algorithm based on the k-nearest neighbors method for imputing missing values in data with continuous variables and multilevel structures. Results from simulation studies show that the proposed method outperforms benchmark methods in cases with high intraclass correlation, reducing estimation bias and coefficient of determination.
We propose a bi-objective algorithm based on the k-nearest neighbors (biokNN) method to perform imputation of missing values for data with multilevel structures with continuous variables. We define the imputation method as a bi-objective minimization problem and propose a solution algorithm based on a weighted objective function. The algorithm seeks imputed values that balance the dissimilarity between the k-nearest neighbors and the observations within the same cluster. The effectiveness of the proposed method is evaluated through a simulation study, and its results are compared with those of eight benchmark imputation methods. The simulation study is based on the generation of datasets with a varying-intercept-varying-slope multilevel model, and the results are compared both by using well-known accuracy metrics and by estimating the bias of the estimates after inference has been performed. Based on the simulation, the effects of different configurations of multilevel datasets are tested, including the number of clusters, their size, their similarity, the percentage of missing values, and the effect of imbalanced clusters. The results show that the proposed method outperforms the benchmark methods, especially in cases with high intraclass correlation. A comparison of fitted linear multilevel regression models shows that our method can also reduce the bias of the estimates and the coefficient of determination. Finally, the method is tested on three commonly used machine learning datasets and shows better accuracy in most cases compared with the benchmark methods.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据