期刊
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
卷 208, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.chemolab.2020.104212
关键词
Artifacts; Data interpretation; Exploratory data analysis; Model interpretation; Sparse principal component analysis; Sparsity
类别
资金
- Spanish Ministry of Economy and Competitiveness
- ERDF (European Regional Development Fund) [TIN2017-83494-R]
- Plan Propio de la Universidad de Granada
- Netherlands Organisation for Health Research and Development (ZonMW) [456008002]
Sparse Principal Component Analysis (sPCA) is a matrix factorization approach based on Principal Component Analysis (PCA) that aims to improve data interpretation, particularly for high-dimensional biological omics data. Part I of this series highlighted limitations of state-of-the-art sPCA algorithms when modeling noise-free data, while Part II focuses on analyzing the drawbacks of sPCA methods using deflation for calculating subsequent components, showing potential problems in model interpretation even for noise-free data. New diagnostics are proposed to identify modeling issues in real-data analysis.
Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA). It combines variance maximization and sparsity with the ultimate goal of improving data interpretation. A main application of sPCA is to handle high-dimensional data, for example biological omics data. In Part I of this series, we illustrated limitations of several state-of-the-art sPCA algorithms when modeling noise-free data, simulated following an exact sPCA model. In this Part II we provide a thorough analysis of the limitations of sPCA methods that use deflation for calculating subsequent, higher order, components. We show, both theoretically and numerically, that deflation can lead to problems in the model interpretation, even for noise free data. In addition, we contribute diagnostics to identify modeling problems in real-data analysis.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据