4.4 Article

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS

Journal

ANNALS OF APPLIED STATISTICS
Volume 2, Issue 3, Pages 986-1012

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/08-AOAS182

Keywords

Microarray; gene expression; multiple testing; feature selection

Funding

  1. National Defense Science and Enoineering Graduate Fellowship
  2. NSF [DMS-99-71405]
  3. National Institutes of Health [N01-11V-28183]
  4. NATIONAL INSTITUTE OF BIOMEDICAL IMAGING AND BIOENGINEERING [R01EB001988] Funding Source: NIH RePORTER

Ask authors/readers for more resources

We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L-1 penalty in order to de-noise the resulting projecting. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and stimulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available