4.1 Article

Nonparametric estimation for big-but-biased data

Journal

TEST
Volume 30, Issue 4, Pages 861-883

Publisher

SPRINGER
DOI: 10.1007/s11749-020-00749-5

Keywords

Bandwidth selection; Big data; Kernel density estimation; Large sample size; Sampling bias

Funding

  1. MINECO through the European Regional Development Fund (ERDF) [MTM2017-82724-R]
  2. Xunta de Galicia (Grupos de Referencia Competitiva) through the European Regional Development Fund (ERDF) [ED431C-2016-015, ED431C-2020-14]
  3. Xunta de Galicia (Centro Singular de Investigacion de Galicia) through the European Regional Development Fund (ERDF) [ED431G/01]
  4. Xunta de Galicia (Centro de Investigacion del Sistema Universitario de Galicia) through the European Regional Development Fund (ERDF) [ED431G 2019/01]
  5. European Regional Development Fund (ERDF)

Ask authors/readers for more resources

This paper investigates nonparametric estimation for a large-sized sample subject to sampling bias, proposing a new method that integrates kernel density estimation and outperforms classical methods in mean estimation. Simulation results show the positive performance of the new method with suitable choices of smoothing parameters, as well as the influence of these parameters on the final estimator.
Nonparametric estimation for a large-sized sample subject to sampling bias is studied in this paper. The general parameter considered is the mean of a transformation of the random variable of interest. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. A new nonparametric estimator that incorporates kernel density estimation is proposed. Asymptotic properties for this estimator are obtained under suitable limit conditions on the small and the large sample sizes and standard and non-standard asymptotic conditions on the two bandwidths. Explicit formulas are shown for the particular case of mean estimation. Simulation results show that the new mean estimator outperforms two classical ones for suitable choices of the two smoothing parameters involved. The influence of two smoothing parameters on the performance of the final estimator is also studied, exhibiting a striking limit behavior of their optimal values. The new method is applied to a real data set from the Telco Company Vodafone ES, where a bootstrap algorithm is used to select the smoothing parameter.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available