4.6 Review

A selective overview of feature screening for ultrahigh-dimensional data

Journal

SCIENCE CHINA-MATHEMATICS
Volume 58, Issue 10, Pages 2033-2054

Publisher

SCIENCE PRESS
DOI: 10.1007/s11425-015-5062-9

Keywords

correlation learning; distance correlation; sure independence screening; sure joint screening; sure screening property; ultrahigh-dimensional data

Funding

  1. National Natural Science Foundation of China [11401497, 11301435]
  2. Fundamental Research Funds for the Central Universities [T2013221043, 20720140034]
  3. Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry
  4. National Institute on Drug Abuse, National Institutes of Health [P50 DA036107, P50 DA039838]
  5. National Science Foundation [DMS1512422]
  6. Direct For Mathematical & Physical Scien
  7. Division Of Mathematical Sciences [1512422] Funding Source: National Science Foundation

Ask authors/readers for more resources

High-dimensional data have frequently been collected in many scientific areas including genome-wide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high-dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many high-dimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available