4.6 Article

Stabilized Nearest Neighbor Classifier and its Statistical Properties

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
卷 111, 期 515, 页码 1254-1265

出版社

AMER STATISTICAL ASSOC
DOI: 10.1080/01621459.2015.1089772

关键词

Bayes risk; Classification; Margin condition; Minimax optimality; Reproducibility; Stability

资金

  1. Simons Foundation [246649]
  2. NSF [DMS-1151692, DMS-1418042]
  3. Simons Fellowship in Mathematics
  4. Office of Naval Research [ONR N00014-15-1-2331]
  5. Indiana Clinical and Translational Sciences Institute

向作者/读者索取更多资源

The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据