4.1 Article

Chunk-wise regularised PCA-based imputation of missing data

期刊

STATISTICAL METHODS AND APPLICATIONS
卷 31, 期 2, 页码 365-386

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s10260-021-00575-5

关键词

Principal components; Missing data; Eigenspace arithmetics

资金

  1. Universita` degli Studi di Napoli Federico II

向作者/读者索取更多资源

Two chunk-wise implementations of RPCA suitable for tall data sets are proposed in this paper, with one for distributed computation and the other for incremental computation. Experimental results show that the distributed approach performs similarly to batch RPCA for data with completely random missing entries, while the incremental approach shows good performance for data with non-completely random missing entries if the first analyzed chunks contain sufficient information on the data structure.
Standard multivariate techniques like Principal Component Analysis (PCA) are based on the eigendecomposition of a matrix and therefore require complete data sets. Recent comparative reviews of PCA algorithms for missing data showed the regularised iterative PCA algorithm (RPCA) to be effective. This paper presents two chunk-wise implementations of RPCA suitable for the imputation of tall data sets, that is, data sets with many observations. A chunk is a subset of the whole set of available observations. In particular, one implementation is suitable for distributed computation as it imputes each chunk independently. The other implementation, instead, is suitable for incremental computation, where the imputation of each new chunk is based on all the chunks analysed that far. The proposed procedures were compared to batch RPCA considering different data sets and missing data mechanisms. Experimental results showed that the distributed approach had similar performance to batch RPCA for data with entries missing completely at random. The incremental approach showed appreciable performance when the data is missing not completely at random, and the first analysed chunks contain sufficient information on the data structure.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据