4.7 Article

Protecting the anonymity of online users through Bayesian data synthesis

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 216, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.119409

关键词

Data privacy; User-generated content; Bayesian data synthesis; Structured data; Textual content

向作者/读者索取更多资源

Privacy concerns arise when online users of popular user-generated content platforms are identified through a combination of their structured data and textual content. To address this, we propose a Bayesian sequential synthesis methodology for organizations to share structured data along with textual content. Our approach allows platforms to control the privacy level of their released data using a single shrinkage parameter. Our results demonstrate that our synthesis strategy reduces the probability of user identification while preserving much of the textual content in the structured data. Moreover, we find that sharing protected data offers greater value than sharing the unprotected structured data and textual content separately. These findings encourage UGC platforms to protect online user anonymity by using synthetic data.
Privacy concerns emerge when online users of popular user-generated content (UGC) platforms are identified through a combination of their structured data (e.g., location and name) and textual content (e.g., word choices and writing style). To overcome this problem, we introduce a Bayesian sequential synthesis methodology for organizations to share structured data adjoined to textual content. Our proposed approach enables platforms to use a single shrinkage parameter to control the privacy level of their released UGC data. Our results show that our synthesis strategy decreases the probability of identification of a user to an acceptable threshold while maintaining much of the textual content present in the structured data. Additionally, we find that the value of sharing our protected data exceeds that of sharing the unprotected structured data and textual content separately. These findings encourage UGC platforms that wish to be known for consumer privacy to protect anonymity of their online users with synthetic data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据