4.2 Article

Imputation techniques for multivariate missingness in software measurement data

期刊

SOFTWARE QUALITY JOURNAL
卷 16, 期 4, 页码 563-600

出版社

SPRINGER
DOI: 10.1007/s11219-008-9054-7

关键词

Imputation; Software quality; Missing data; Data quality; Bayesian multiple imputation

向作者/读者索取更多资源

The problem of missing values in software measurement data used in empirical analysis has led to the proposal of numerous potential solutions. Imputation procedures, for example, have been proposed to 'fill-in' the missing values with plausible alternatives. We present a comprehensive study of imputation techniques using real-world software measurement datasets. Two different datasets with dramatically different properties were utilized in this study, with the injection of missing values according to three different missingness mechanisms (MCAR, MAR, and NI). We consider the occurrence of missing values in multiple attributes, and compare three procedures, Bayesian multiple imputation, k Nearest Neighbor imputation, and Mean imputation. We also examine the relationship between noise in the dataset and the performance of the imputation techniques, which has not been addressed previously. Our comprehensive experiments demonstrate conclusively that Bayesian multiple imputation is an extremely effective imputation technique.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据