4.8 Article

Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence

Journal

IEEE INTERNET OF THINGS JOURNAL
Volume 8, Issue 8, Pages 6757-6770

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2020.3027980

Keywords

Cleaning; Data privacy; Protocols; Internet of Things; Collaboration; Data models; Servers; Data cleaning; edge intelligence (EI); privacy preserving

Funding

  1. National Key Research and Development Program of China [2018YFE0126000]
  2. Key Program of NSFC-Tongyong Union Foundation [U1636209]
  3. National Natural Science Foundation of China [61902292, 61972453]
  4. Key Research and Development Programs of Shaanxi [2019ZDLGY13-07, 2019ZDLGY13-04]
  5. Fundamental Research Funds for the Central Universities [XJS201502]

Ask authors/readers for more resources

The study proposed a federated data cleaning protocol, FedClean, for edge intelligence scenarios to achieve data cleaning without compromising data privacy. By generating Boolean shares of data and privately computing AVF scores, abnormal data entries are filtered out through a bitonic sorting network.
As an important driving factor of emerging Internet-of-Things (IoT) applications, machine learning algorithms are currently facing the challenge of how to clean data noise, that is introduced during the training process (e.g., asynchronous execution and lossy data compression and quantization). In an attempt to guarantee data quality, various data cleaning approaches have been proposed to filter out abnormal data entries based on the global data distribution. However, most existing data cleaning approaches are based on a centralized paradigm and thus cannot be applied to future edge-based IoT applications, where each edge node (EN) has only a limited view of the global data distribution. Moreover, the increasing demand for privacy preservation largely prevents ENs from combining their data for centralized cleaning. In this study, we propose a federated data cleaning protocol, coined as FedClean, for edge intelligence (EI) scenarios that is designed to achieve data cleaning without compromising data privacy. More specifically, different ENs first generate Boolean shares of their data and distribute them to two noncolluding servers. These two servers then run the FedClean protocol to privately and efficiently compute the attribute value frequency (AVF) scores of the collected data entries, which are then sorted in ascending order via a bitonic sorting network without revealing their values. As a result, data entries with lower AVF scores are considered as abnormal and filtered out. The security, efficiency, and effectiveness of the proposed approach are then demonstrated via concrete security analysis and comprehensive experiments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available