4.6 Article

Differentially Private Publication of Vertically Partitioned Data

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TDSC.2019.2905237

关键词

Publishing; Differential privacy; Distributed databases; Protocols; Privacy; Correlation; Differential privacy; data publishing; vertical partitioning; latent tree model

资金

  1. National Natural Science Foundation of China [61872045]
  2. BUPT Excellent Ph.D.
  3. Students Foundation [CX2016301]

向作者/读者索取更多资源

This paper focuses on the issue of publishing vertically partitioned data under differential privacy, proposing a differentially private latent tree (DPLT) approach that can generate synthetic datasets while protecting data privacy. Through extensive experiments, it is demonstrated that this method offers desirable data utility with low computation costs.
In this paper, we study the problem of publishing vertically partitioned data under differential privacy, where different attributes of the same set of individuals are held by multiple parties. In this setting, with the assistance of a semi-trusted curator, the involved parties aim to collectively generate an integrated dataset while satisfying differential privacy for each local dataset. Based on the latent tree model (LTM), we present a differentially private latent tree (DPLT) approach, which is, to the best of our knowledge, the first approach to solving this challenging problem. In DPLT, the parties and the curator collaboratively identify the latent tree that best approximates the joint distribution of the integrated dataset, from which a synthetic dataset can be generated. The fundamental advantage of adopting LTM is that we can use the connections between a small number of latent attributes derived from each local dataset to capture the cross-dataset dependencies of the observed attributes in all local datasets such that the joint distribution of the integrated dataset can be learned with little injected noise and low computation and communication costs. DPLT is backed up by a series of novel techniques, including two-phase latent attribute generation (TLAG), tree index based correlation quantification (TICQ) and distributed Laplace perturbation protocol (DLPP). Extensive experiments on real datasets demonstrate that DPLT offers desirable data utility with low computation and communication costs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据