4.7 Article

High dimensional classification with combined adaptive sparse PLS and logistic regression

期刊

BIOINFORMATICS
卷 34, 期 3, 页码 485-493

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btx571

关键词

-

资金

  1. French National Resarch Agency (ANR) as part of the 'Algorithmics, Bioinformatics and Statistics for Next Generation Sequencing data analysis' (ABS4NGS) ANR project [ANR-11-BINF-0001-06]
  2. 'MACARON' ANR project [ANR-14-CE23-0003]
  3. Agence Nationale de la Recherche (ANR) [ANR-14-CE23-0003] Funding Source: Agence Nationale de la Recherche (ANR)

向作者/读者索取更多资源

Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which combined constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to unstable and non convergent methods due to inappropriate computational frameworks. We hereby propose a computationally stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS). Results: We start by proposing a new solution for the sparse PLS problem that is based on proximal operators for the case of univariate responses. Then we develop an adaptive version of the sparse PLS for classification, called logit-SPLS, which combines iterative optimization of logistic regression and sparse PLS to ensure computational convergence and stability. Our results are confirmed on synthetic and experimental data. In particular, we show how crucial convergence and stability can be when cross-validation is involved for calibration purposes. Using gene expression data, we explore the prediction of breast cancer relapse. We also propose a multicategorial version of our method, used to predict cell-types based on single-cell expression data. Availability and implementation: Our approach is implemented in the plsgenomics R-package. Contact: ghislain.durif@inria.fr Supplementary information: Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据