☆ 4.7 Article

A computational model to protect patient data from location-based re-identification

ARTIFICIAL INTELLIGENCE IN MEDICINE (2007)

期刊

ARTIFICIAL INTELLIGENCE IN MEDICINE

卷 40, 期 3, 页码 223-239

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.artmed.2007.04.002

关键词

privacy; confidentiality; genomics; databases; electronic medical records; distributed systems; graphical models

类别

Computer Science, Artificial Intelligence Engineering, Biomedical Medical Informatics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripe ping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or trail, can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trait re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trait re-identification. Methods and materials: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several. algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the traits of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records. Results: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate k-unlinkability. Rather, only portions of the traits need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naive solution would suppress the 52% of the DNA records that violate 5-unlinkability. However, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied. Conclusion: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient- specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly. (C) 2007 Elsevier B.V. All rights reserved.

A computational model to protect patient data from location-based re-identification

期刊

ARTIFICIAL INTELLIGENCE IN MEDICINE

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A computational model to protect patient data from location-based re-identification

期刊

ARTIFICIAL INTELLIGENCE IN MEDICINE

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文