4.7 Article

Novel kernel density estimator based on ensemble unbiased cross-validation

Journal

INFORMATION SCIENCES
Volume 581, Issue -, Pages 327-344

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.09.045

Keywords

Probability density function; Kernel density estimator; Bandwidth; Unbiased cross-validation; Data-block based UCV; Data-point based UCV

Funding

  1. National Natural Science Foundation of China [61972261]
  2. Basic Research Foundation of Shenzhen [20210312191246002]
  3. Scientific Research Foundation of Shenzhen University [860/000002110628]
  4. Key R&D Program of Science and Technology Foundation of Hebei Province [19210310D]
  5. Natural Science Foundation of Hebei Province [F2021201020]

Ask authors/readers for more resources

This paper proposes a novel ensemble UCV based KDE (EUCV-KDE), which determines the expectation of an estimated PDF using an ensemble of data-block based UCVs. A novel objective function is designed for EUCV-KDE by considering the empirical and structural risk of KDE together. The experimental results show that EUCV-KDE is more stable and performs better than classical UCV-KDE and RCV-KDE.
Unbiased cross-validation (UCV) is a commonly-used method to calculate the optimal bandwidth for the kernel density estimator (KDE), which estimates the underlying probability density function (PDF) for a given data set. Since the UCV method was proposed, there have been few studies that have pointed out its instability when determining the KDE bandwidth. Following the principle of stability improvement, this paper presents a novel ensemble UCV based KDE (EUCV-KDE), which determines the expectation of an estimated PDF using an ensemble of data-block based UCVs rather than a single data-point based UCV. To derive the optimal bandwidth, a novel objective function is designed for EUCV-KDE by considering the empirical and structural risk of KDE together. We validate the rationality and effectiveness of EUCV-KDE on 10 probability distributions. The experimental results show that EUCV-KDE is convergent as the number of data-block based UCVs increases and can obtain a more stable and better prediction performance than the classical UCV-KDE and the revisited cross-validation (RCV) based KDE (RCV-KDE). In addition, a real-world application based on UK climate data is provided to further validate the effectiveness of EUCV-KDE by determining the optimal bandwidth for Nadaraya-Watson kernel regression estimator. (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available