☆ 4.8 Article

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2008)

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

卷 105, 期 39, 页码 14790-14795

出版社

NATL ACAD SCIENCES

DOI: 10.1073/pnas.0807471105

关键词

false discovery rate; linear classification; threshold selection; rare/weak feature models

类别

Multidisciplinary Sciences

资金

National Science Foundation [DMS-0505303, DMS-0505423, DMS-0639980]
Direct For Mathematical & Physical Scien
Division Of Mathematical Sciences [0908613] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HQ. For i = 1, 2,..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p -pi((i)))/root i/p(1-i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

出版社

NATL ACAD SCIENCES

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

出版社

NATL ACAD SCIENCES

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文