4.5 Article

Distributed K-Means clustering guaranteeing local differential privacy

Journal

COMPUTERS & SECURITY
Volume 90, Issue -, Pages -

Publisher

ELSEVIER ADVANCED TECHNOLOGY
DOI: 10.1016/j.cose.2019.101699

Keywords

Differential privacy; Randomized response; Machine learning; Distributed clustering; K-Means

Funding

  1. National Key R&D Program of China [NSFC-61972195]
  2. [2018YFB1004301]
  3. [NSFC-61872179]
  4. [NSFC-61425024]
  5. [NSFC-61872176]

Ask authors/readers for more resources

In many cases, a service provider might require to aggregate data from end-users to perform mining tasks such as K-means clustering. Nevertheless, since such data often contain sensitive information. In this paper, we propose the first locally differentially private K-means mechanism under this distributed scenario. Differing from standard differentially private clustering mechanisms, the proposed mechanism doesn't need any trusted third party to collect and preprocess users data. Our mechanism first perturbs users data locally to satisfy local differential privacy (LDP). Then it revises the traditional K-means algorithm to allow the service provider to obtain high-quality clustering results by collaborating with users based on the highly perturbed data. We prove that our mechanism can enable high utility clustering while guaranteeing local differential privacy for each user. We also propose an extended mechanism to improve our basic model in terms of privacy and utility. In this mechanism, we perturb both users' sensitive data and the intermediate results of users' clusters in each iteration. Moreover, we consider a more general setting where the users may have different privacy requirements. Extensive experiments are conducted on two real-world datasets, and the results show that our proposal can well preserve the quality of clustering results. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available