4.3 Article

HDDA: DataSifter: statistical obfuscation of electronic health records and other sensitive datasets

期刊

出版社

TAYLOR & FRANCIS LTD
DOI: 10.1080/00949655.2018.1545228

关键词

Data sharing; personal privacy; information protection; Big Data; statistical method

资金

  1. National Science Foundation (NSF) [1734853, 1636840, 1416953, 0716055, 1023115]
  2. National Institutes of Health (NIH) [P20 NR015331, U54 EB020406, P50 NS091856, P30 DK089503, P30AG053760, UL1TR002240]
  3. Elsie Andresen Fiske Research Fund
  4. Michigan Institute for Data Science
  5. Direct For Computer & Info Scie & Enginr
  6. Office of Advanced Cyberinfrastructure (OAC) [1636840] Funding Source: National Science Foundation

向作者/读者索取更多资源

There are no practical and effective mechanisms to share high-dimensional data including sensitive information in various fields like health financial intelligence or socioeconomics without compromising either the utility of the data or exposing private personal or secure organizational information. Excessive scrambling or encoding of the information makes it less useful for modelling or analytical processing. Insufficient preprocessing may compromise sensitive information and introduce a substantial risk for re-identification of individuals by various stratification techniques. To address this problem, we developed a novel statistical obfuscation method (DataSifter) for on-the-fly de-identification of structured and unstructured sensitive high-dimensional data such as clinical data from electronic health records (EHR). DataSifter provides complete administrative control over the balance between risk of data re-identification and preservation of the data information. Simulation results suggest that DataSifter can provide privacy protection while maintaining data utility for different types of outcomes of interest. The application of DataSifter on a large autism dataset provides a realistic demonstration of its promise practical applications.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据