4.7 Article

Hybrid microdata using microaggregation

期刊

INFORMATION SCIENCES
卷 180, 期 15, 页码 2834-2844

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2010.04.005

关键词

Statistical disclosure control; Microdata protection; Privacy-preserving data mining; Synthetic data; Hybrid data; Microaggregation

资金

  1. Government of Catalonia [2009 SGR 1135]
  2. Spanish Government [TSI2007-65406-C03-01, CSD2007-00004]
  3. European Comission [25200.2005.003-2007.670]

向作者/读者索取更多资源

Statistical disclosure control (also known as privacy-preserving data mining) of microdata is about releasing data sets containing the answers of individual respondents protected in such a way that: (i) the respondents corresponding to the released records cannot be re-identified; (ii) the released data stay analytically useful. Usually, the protected data set is generated by either masking (i.e. perturbing) the original data or by generating synthetic (i.e. simulated) data preserving some pre-selected statistics of the original data. Masked data may approximately preserve a broad range of distributional characteristics, although very few of them (if any) are exactly preserved; on the other hand, synthetic data exactly preserve the pre-selected statistics and may seem less disclosive than masked data, but they do not preserve at all any statistics other than those pre-selected. Hybrid data obtained by mixing the original data and synthetic data have been proposed in the literature to combine the strengths of masked and synthetic data. We show how to easily obtain hybrid data by combining microaggregation with any synthetic data generator. We show that numerical hybrid data exactly preserving means and covariances of original data and approximately preserving other statistics as well as some subdomain analyses can be obtained as a particular case with a very simple parameterization. The new method is competitive versus both the literature on hybrid data and plain multivariate microaggregation. (C) 2010 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据