☆ 4.5 Article

Privacy preserving data visualizations

EPJ DATA SCIENCE (2021)

期刊

EPJ DATA SCIENCE

卷 10, 期 1, 页码 -

出版社

SPRINGER

DOI: 10.1140/epjds/s13688-020-00257-4

关键词

Sensitive data; Data visualizations; Disclosure control; Privacy protection; Anonymization

类别

Mathematics, Interdisciplinary Applications Social Sciences, Mathematical Methods

资金

European Commission [824989]
European Union's Horizon 2020 research and innovation programme [874583]
UK Department of Health and Social Care [RES/0150/7943/202]
Wellcome Trust [102215, 108439/Z/15/Z, MR/N01104X/1, MR/N01104X/2]
Medical Research Council [108439/Z/15/Z, MR/N01104X/1, MR/N01104X/2]
Economic and Social Research Council [MR/N01104X/1, MR/N01104X/2]
Health Data Research UK [MR/S003959/1]
National Institute for Health Research Applied Research Collaboration
Public Health England
European Union [786247]
UK's Medical Research Council [MC_PC_17210]
UKRI Innovation Fellowship
Marie Curie Actions (MSCA) [786247] Funding Source: Marie Curie Actions (MSCA)
MRC [MR/S003959/1] Funding Source: UKRI
Wellcome Trust [108439/Z/15/Z] Funding Source: Wellcome Trust

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Data visualizations are valuable tools that graphically reveal information about data structures, properties, and relationships between variables. However, in sensitive fields like medicine and social sciences, restrictions are placed on sharing individual-level records to protect privacy. Anonymization techniques such as k-anonymization and probabilistic perturbation can be used to generate privacy-preserving visualizations while adhering to data protection laws. These methods allow for exploration and inferential analysis while maintaining data confidentiality.

Data visualizations are a valuable tool used during both statistical analysis and the interpretation of results as they graphically reveal useful information about the structure, properties and relationships between variables, which may otherwise be concealed in tabulated data. In disciplines like medicine and the social sciences, where collected data include sensitive information about study participants, the sharing and publication of individual-level records is controlled by data protection laws and ethico-legal norms. Thus, as data visualizations - such as graphs and plots - may be linked to other released information and used to identify study participants and their personal attributes, their creation is often prohibited by the terms of data use. These restrictions are enforced to reduce the risk of breaching data subject confidentiality, however they limit analysts from displaying useful descriptive plots for their research features and findings. Here we propose the use of anonymization techniques to generate privacy-preserving visualizations that retain the statistical properties of the underlying data while still adhering to strict data disclosure rules. We demonstrate the use of (i) the well-known k-anonymization process which preserves privacy by reducing the granularity of the data using suppression and generalization, (ii) a novel deterministic approach that replaces individual-level observations with the centroids of each k nearest neighbours, and (iii) a probabilistic procedure that perturbs individual attributes with the addition of random stochastic noise. We apply the proposed methods to generate privacy-preserving data visualizations for exploratory data analysis and inferential regression plot diagnostics, and we discuss their strengths and limitations.

Privacy preserving data visualizations

期刊

EPJ DATA SCIENCE

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Privacy preserving data visualizations

期刊

EPJ DATA SCIENCE

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文