4.6 Article

Threshold Selection in Feature Screening for Error Rate Control

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
卷 118, 期 543, 页码 1773-1785

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2021.2011735

关键词

Empirical distribution; False discovery rate; Feature screening; Per family error rate; Symmetry

向作者/读者索取更多资源

The hard thresholding rule is commonly used in feature screening for ultrahigh-dimensional data. However, choosing the right threshold can be challenging. This study introduces a data-adaptive threshold selection procedure with error rate control, which is able to control the false discovery rate and per family error rate while retaining all important predictors.
Hard thresholding rule is commonly adopted in feature screening procedures to screen out unimportant predictors for ultrahigh-dimensional data. However, different thresholds are required to adapt to different contexts of screening problems and an appropriate thresholding magnitude usually varies from the model and error distribution. With an ad-hoc choice, it is unclear whether all of the important predictors are selected or not, and it is very likely that the procedures would include many unimportant features. We introduce a data-adaptive threshold selection procedure with error rate control, which is applicable to most kinds of popular screening methods. The key idea is to apply the sample-splitting strategy to construct a series of statistics with marginal symmetry property and then to utilize the symmetry for obtaining an approximation to the number of false discoveries. We show that the proposed method is able to asymptotically control the false discovery rate and per family error rate under certain conditions and still retains all of the important predictors. Three important examples are presented to illustrate the merits of the new proposed procedures. Numerical experiments indicate that the proposed methodology works well for many existing screening methods. Supplementary materials for this article are available online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据