4.7 Article

A computational model to protect patient data from location-based re-identification

期刊

ARTIFICIAL INTELLIGENCE IN MEDICINE
卷 40, 期 3, 页码 223-239

出版社

ELSEVIER SCIENCE BV
DOI: 10.1016/j.artmed.2007.04.002

关键词

privacy; confidentiality; genomics; databases; electronic medical records; distributed systems; graphical models

向作者/读者索取更多资源

Objective: Health care organizations must preserve a patient's anonymity when disclosing personal data. Traditionally, patient identity has been protected by stripe ping identifiers from sensitive data such as DNA. However, simple automated methods can re-identify patient data using public information. In this paper, we present a solution to prevent a threat to patient anonymity that arises when multiple health care organizations disclose data. In this setting, a patient's location visit pattern, or trail, can re-identify seemingly anonymous DNA to patient identity. This threat exists because health care organizations (1) cannot prevent the disclosure of certain types of patient information and (2) do not know how to systematically avoid trait re-identification. In this paper, we develop and evaluate computational methods that health care organizations can apply to disclose patient-specific DNA records that are impregnable to trait re-identification. Methods and materials: To prevent trail re-identification, we introduce a formal model called k-unlinkability, which enables health care administrators to specify different degrees of patient anonymity. Specifically, k-unlinkability is satisfied when the trail of each DNA record is linkable to no less than k identified records. We present several. algorithms that enable health care organizations to coordinate their data disclosure, so that they can determine which DNA records can be shared without violating k-unlinkability. We evaluate the algorithms with the traits of patient populations derived from publicly available hospital discharge databases. Algorithm efficacy is evaluated using metrics based on real world applications, including the number of suppressed records and the number of organizations that disclose records. Results: Our experiments indicate that it is unnecessary to suppress all patient records that initially violate k-unlinkability. Rather, only portions of the traits need to be suppressed. For example, if each hospital discloses 100% of its data on patients diagnosed with cystic fibrosis, then 48% of the DNA records are 5-unlinkable. A naive solution would suppress the 52% of the DNA records that violate 5-unlinkability. However, by applying our protection algorithms, the hospitals can disclose 95% of the DNA records, all of which are 5-unlinkable. Similar findings hold for all populations studied. Conclusion: This research demonstrates that patient anonymity can be formally protected in shared databases. Our findings illustrate that significant quantities of patient- specific data can be disclosed with provable protection from trail re-identification. The configurability of our methods allows health care administrators to quantify the effects of different levels of privacy protection and formulate policy accordingly. (C) 2007 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据