☆ 4.8 Article

Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence

IEEE INTERNET OF THINGS JOURNAL (2021)

Journal

IEEE INTERNET OF THINGS JOURNAL

Volume 8, Issue 8, Pages 6757-6770

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JIOT.2020.3027980

Keywords

Cleaning; Data privacy; Protocols; Internet of Things; Collaboration; Data models; Servers; Data cleaning; edge intelligence (EI); privacy preserving

Funding

National Key Research and Development Program of China [2018YFE0126000]
Key Program of NSFC-Tongyong Union Foundation [U1636209]
National Natural Science Foundation of China [61902292, 61972453]
Key Research and Development Programs of Shaanxi [2019ZDLGY13-07, 2019ZDLGY13-04]
Fundamental Research Funds for the Central Universities [XJS201502]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study proposed a federated data cleaning protocol, FedClean, for edge intelligence scenarios to achieve data cleaning without compromising data privacy. By generating Boolean shares of data and privately computing AVF scores, abnormal data entries are filtered out through a bitonic sorting network.

As an important driving factor of emerging Internet-of-Things (IoT) applications, machine learning algorithms are currently facing the challenge of how to clean data noise, that is introduced during the training process (e.g., asynchronous execution and lossy data compression and quantization). In an attempt to guarantee data quality, various data cleaning approaches have been proposed to filter out abnormal data entries based on the global data distribution. However, most existing data cleaning approaches are based on a centralized paradigm and thus cannot be applied to future edge-based IoT applications, where each edge node (EN) has only a limited view of the global data distribution. Moreover, the increasing demand for privacy preservation largely prevents ENs from combining their data for centralized cleaning. In this study, we propose a federated data cleaning protocol, coined as FedClean, for edge intelligence (EI) scenarios that is designed to achieve data cleaning without compromising data privacy. More specifically, different ENs first generate Boolean shares of their data and distribute them to two noncolluding servers. These two servers then run the FedClean protocol to privately and efficiently compute the attribute value frequency (AVF) scores of the collected data entries, which are then sorted in ascending order via a bitonic sorting network without revealing their values. As a result, data entries with lower AVF scores are considered as abnormal and filtered out. The security, efficiency, and effectiveness of the proposed approach are then demonstrated via concrete security analysis and comprehensive experiments.

Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence

Journal

IEEE INTERNET OF THINGS JOURNAL

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Federated Data Cleaning: Collaborative and Privacy-Preserving Data Cleaning for Edge Intelligence

Journal

IEEE INTERNET OF THINGS JOURNAL

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper