☆ 4.5 Article

Protecting Privacy in Large Datasets-First We Assess the Risk; Then We Fuzzy the Data

CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION (2017)

期刊

CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION

卷 26, 期 8, 页码 1219-1224

出版社

AMER ASSOC CANCER RESEARCH

DOI: 10.1158/1055-9965.EPI-17-0172

关键词

类别

Oncology Public, Environmental & Occupational Health

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Privacy of information is an increasing concern with the availability of large amounts of data from many individuals. Even when access to data is heavily controlled, and the data shared with researchers contain no personal identifying information, there is a possibility of reidentifying individuals. To avoid reidentification, several anonymization protocols are available. These include categorizing variables into broader categories to ensure more than one individual in each category, such as k-anonymization, as well as protocols aimed at adding noise to the data. However, data custodians rarely assess reidentification risks. Methods: We assessed the reidentification risk of a large realistic dataset based on screening data from over 5 million records on 0.9 million women in the Norwegian Cervical Cancer Screening Program, before and after we used old and new techniques of adding noise (fuzzification) of the data. Results: Categorizing date variables (applying k-anonymization) substantially reduced the possibility of reidentification of individuals. Adding a random factor, such as a fuzzy factor used here, makes it even more difficult to reidentify specific individuals. Conclusions: Our results show that simple techniques can substantially reduce the risk of reidentification. Impact: Registry owners and large-scale data custodians should consider estimating and if necessary, reducing reidentification risks before sharing large datasets. (C) 2017 AACR.

Protecting Privacy in Large Datasets-First We Assess the Risk; Then We Fuzzy the Data

期刊

CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION

出版社

AMER ASSOC CANCER RESEARCH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Protecting Privacy in Large Datasets-First We Assess the Risk; Then We Fuzzy the Data

期刊

CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION

出版社

AMER ASSOC CANCER RESEARCH

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文