Journal
ANNALS OF APPLIED STATISTICS
Volume 2, Issue 3, Pages 986-1012Publisher
INST MATHEMATICAL STATISTICS
DOI: 10.1214/08-AOAS182
Keywords
Microarray; gene expression; multiple testing; feature selection
Categories
Funding
- National Defense Science and Enoineering Graduate Fellowship
- NSF [DMS-99-71405]
- National Institutes of Health [N01-11V-28183]
- NATIONAL INSTITUTE OF BIOMEDICAL IMAGING AND BIOENGINEERING [R01EB001988] Funding Source: NIH RePORTER
Ask authors/readers for more resources
We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L-1 penalty in order to de-noise the resulting projecting. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and stimulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available