4.6 Article

OPTIMAL CROSS-VALIDATION IN DENSITY ESTIMATION WITH THE L2-LOSS

Journal

ANNALS OF STATISTICS
Volume 42, Issue 5, Pages 1879-1910

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/14-AOS1240

Keywords

Cross-validation; leave-p-out; resampling; risk estimation; model selection; density estimation; oracle inequality; projection estimators; concentration inequalities

Funding

  1. French Agence Nationale de la Recherche (ANR) [ANR-09-JCJC-0027-01, ANR-11-BS01-0010]
  2. Agence Nationale de la Recherche (ANR) [ANR-09-JCJC-0027, ANR-11-BS01-0010] Funding Source: Agence Nationale de la Recherche (ANR)

Ask authors/readers for more resources

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p = 1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size n, optimality is achieved for p large enough [with p/n = o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p I n is conveniently related to the rate of convergence of the best estimator in the collection: (i) p/n -> 1 as n -> +infinity with a parametric rate, and (ii) p/n = o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available