4.7 Article

All sparse PCA models are wrong, but some are useful. Part II: Limitations and problems of deflation

出版社

ELSEVIER
DOI: 10.1016/j.chemolab.2020.104212

关键词

Artifacts; Data interpretation; Exploratory data analysis; Model interpretation; Sparse principal component analysis; Sparsity

资金

  1. Spanish Ministry of Economy and Competitiveness
  2. ERDF (European Regional Development Fund) [TIN2017-83494-R]
  3. Plan Propio de la Universidad de Granada
  4. Netherlands Organisation for Health Research and Development (ZonMW) [456008002]

向作者/读者索取更多资源

Sparse Principal Component Analysis (sPCA) is a matrix factorization approach based on Principal Component Analysis (PCA) that aims to improve data interpretation, particularly for high-dimensional biological omics data. Part I of this series highlighted limitations of state-of-the-art sPCA algorithms when modeling noise-free data, while Part II focuses on analyzing the drawbacks of sPCA methods using deflation for calculating subsequent components, showing potential problems in model interpretation even for noise-free data. New diagnostics are proposed to identify modeling issues in real-data analysis.
Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA). It combines variance maximization and sparsity with the ultimate goal of improving data interpretation. A main application of sPCA is to handle high-dimensional data, for example biological omics data. In Part I of this series, we illustrated limitations of several state-of-the-art sPCA algorithms when modeling noise-free data, simulated following an exact sPCA model. In this Part II we provide a thorough analysis of the limitations of sPCA methods that use deflation for calculating subsequent, higher order, components. We show, both theoretically and numerically, that deflation can lead to problems in the model interpretation, even for noise free data. In addition, we contribute diagnostics to identify modeling problems in real-data analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据