3.9 Article

Sparse Partial Least Squares Classification for High Dimensional Data

出版社

WALTER DE GRUYTER GMBH
DOI: 10.2202/1544-6115.1492

关键词

partial least squares; classification; variable selection; dimension reduction; two-stage PLS; iteratively re-weighted partial least squares; gene expression

资金

  1. NSF [DMS 0804597]
  2. NIH [HG03747]
  3. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [R01HG003747] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We consider both binary and multicategory classification. We provide analytical and simulation-based insights about the variable selection properties of these approaches and benchmark them on well known publicly available datasets that involve tumor classification with high dimensional gene expression data. We show that incorporation of SPLS into a generalized linear model (GLM) framework provides higher sensitivity in variable selection for multicategory classification with unbalanced sample sizes between classes. As the sample size increases, the two-stage approach provides comparable sensitivity with better specificity in variable selection. In binary classification and multicategory classification with balanced sample sizes, the two-stage approach provides comparable variable selection and prediction accuracy as the GLM version and is computationally more efficient.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.9
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据