4.0 Article

Two-Stage Procedures for High-Dimensional Data

Journal

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/07474946.2011.619088

Keywords

Asymptotic normality; Classification; Confidence region; HDLSS; Lasso; Pathway analysis; Regression; Sample size determination; Testing equality of covariance matrices; Two-sample test; Variable selection

Funding

  1. Japan Society for the Promotion of Science (JSPS) [22300094]
  2. Grants-in-Aid for Scientific Research [23340022] Funding Source: KAKEN

Ask authors/readers for more resources

In this article, we consider a variety of inference problems for high-dimensional data. The purpose of this article is to suggest directions for future research and possible solutions about p >> n problems by using new types of two-stage estimation methodologies. This is the first attempt to apply sequential analysis to high-dimensional statistical inference ensuring prespecified accuracy. We offer the sample size determination for inference problems by creating new types of multivariate two-stage procedures. To develop theory and methodologies, the most important and basic idea is the asymptotic normality when p -> infinity. By developing asymptotic normality when p -> infinity, we first give (a) a given-bandwidth confidence region for the square loss. In addition, we give (b) a two-sample test to assure prespecified size and power simultaneously together with (c) an equality-test procedure for two covariance matrices. We also give (d) a two-stage discriminant procedure that controls misclassification rates being no more than a prespecified value. Moreover, we propose (e) a two-stage variable selection procedure that provides screening of variables in the first stage and selects a significant set of associated variables from among a set of candidate variables in the second stage. Following the variable selection procedure, we consider (f) variable selection for high-dimensional regression to compare favorably with the lasso in terms of the assurance of accuracy and the computational cost. Further, we consider variable selection for classification and propose (g) a two-stage discriminant procedure after screening some variables. Finally, we consider (h) pathway analysis for high-dimensional data by constructing a multiple test of correlation coefficients.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available