☆ 4.6 Article

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

TSINGHUA SCIENCE AND TECHNOLOGY (2022)

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Volume 27, Issue 1, Pages 127-140

Publisher

TSINGHUA UNIV PRESS

DOI: 10.26599/TST.2020.9010031

Keywords

anonymization; generalization; incomplete data streams; privacy preservation; utility

Funding

National Natural Science Foundation of China [U19A2081, 61802270]
Fundamental Research Funds for the Central Universities [2020SCUNG129]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The prevalence of missing values in real environments' data streams cannot be ignored in data stream privacy preservation. However, most privacy preservation methods currently developed do not consider missing values. This study proposes a utility-enhanced approach called Incomplete Data strEam Anonymization (IDEA) to balance the utility and privacy preservation of incomplete data streams. The proposed approach introduces a slide-window-based processing framework to continuously anonymize data streams and enables clustering between incomplete records and complete records to generate clusters with minimal information loss. Furthermore, a generalization method based on maybe match is proposed to avoid missing value pollution.

The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams. However, the development of most privacy preservation methods does not consider missing values. A few researches allow them to participate in data anonymization but introduce extra considerable information loss. To balance the utility and privacy preservation of incomplete data streams, we present a utility-enhanced approach for Incomplete Data strEam Anonymization (IDEA). In this approach, a slide-window-based processing framework is introduced to anonymize data streams continuously, in which each tuple can be output with clustering or anonymized clusters. We consider the dimensions of attribute and tuple as the similarity measurement, which enables the clustering between incomplete records and complete records and generates the cluster with minimal information loss. To avoid the missing value pollution, we propose a generalization method that is based on maybe match for generalizing incomplete data. The experiments conducted on real datasets show that the proposed approach can efficiently anonymize incomplete data streams while effectively preserving utility.

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Publisher

TSINGHUA UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

IDEA: A Utility-Enhanced Approach to Incomplete Data Stream Anonymization

Journal

TSINGHUA SCIENCE AND TECHNOLOGY

Publisher

TSINGHUA UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper