4.5 Article

Moment-based density estimation of confidential micro-data: a computational statistics approach

Journal

STATISTICS AND COMPUTING
Volume 33, Issue 1, Pages -

Publisher

SPRINGER
DOI: 10.1007/s11222-022-10203-1

Keywords

Density estimation; Confidential data; Moment problems; Multidimensional approximations

Ask authors/readers for more resources

To protect privacy, synthetic micro-data is often used instead of confidential data. Accurately estimating the density function based on sample micro-data is important for the synthetic data to be useful for analysis.
Providing access to synthetic micro-data in place of confidential data to protect the privacy of participants is common practice. For the synthetic data to be useful for analysis, it is necessary that the density function of the synthetic data closely approximate the confidential data. Hence, accurately estimating the density function based on sample micro-data is important. Existing kernel-based, copula-based, and machine learning methods of joint density estimation may not be viable. Applying the multivariate moments' problem to sample-based density estimation has long been considered impractical due to the computational complexity and intractability of optimal parameter selection of the density estimate when the true joint density function is unknown. This paper introduces a generalised form of the sample moment-based density estimate, which can be used to estimate joint density functions when only the information of empirical moments is available. We demonstrate optimal parametrisation of the moment-based density estimate based solely on sample data by employing a computational strategy for parameter selection. We compare the performance of the moment-based estimate to that of existing non-parametric and parametric density estimation methods. The results show that using empirical moments can provide a reasonable, robust non-parametric approximation of a joint density function that is comparable to existing non-parametric methods. We provide an example of synthetic data generation from the moment-based density estimate and show that the resulting synthetic data provides a reasonable disclosure-protected alternative for public release.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available