4.8 Article

Impossibility of successful classification when useful features are rare and weak

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.0903931106

Keywords

higher criticism; phase diagram; region of impossibility; region of possibility; threshold feature selection

Funding

  1. National Science Foundation [DMS-0908613]
  2. Direct For Mathematical & Physical Scien
  3. Division Of Mathematical Sciences [0908613] Funding Source: National Science Foundation

Ask authors/readers for more resources

We study a two-class classification problem with a large number of features, out of which many are useless and only a few are useful, but we do not know which ones they are. The number of features is large compared with the number of training observations. Calibrating the model with 4 key parameters-the number of features, the size of the training sample, the fraction, and strength of useful features-we identify a region in parameter space where no trained classifier can reliably separate the two classes on fresh data. The complement of this region-where successful classification is possible-is also briefly discussed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available