4.4 Article

Evaluating bias due to data linkage error in electronic healthcare records

期刊

BMC MEDICAL RESEARCH METHODOLOGY
卷 14, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2288-14-36

关键词

Data linkage; Routine data; Bias; Electronic health records; Evaluation; Linkage quality

资金

  1. CATCH trial from the National Institute for Health Research Health Technology Assessment (NIHR HTA) programme [08/13/47]
  2. National Clinical Audit and Patient Outcomes Programme via Healthcare Quality Improvement Partnership (HQIP)
  3. Health Commission Wales Specialised Services
  4. NHS Lothian/National Service Division NHS Scotland
  5. Royal Belfast Hospital for Sick Children
  6. Our Lady's Children's Hospital
  7. Crumlin
  8. Children's University Hospital
  9. Temple Street
  10. Harley Street Clinic, London
  11. Medical Research Council [MR/K006584/1] Funding Source: researchfish
  12. National Institute for Health Research [08/13/47] Funding Source: researchfish

向作者/读者索取更多资源

Background: Linkage of electronic healthcare records is becoming increasingly important for research purposes. However, linkage error due to mis-recorded or missing identifiers can lead to biased results. We evaluated the impact of linkage error on estimated infection rates using two different methods for classifying links: highest-weight (HW) classification using probabilistic match weights and prior-informed imputation (PII) using match probabilities. Methods: A gold-standard dataset was created through deterministic linkage of unique identifiers in admission data from two hospitals and infection data recorded at the hospital laboratories ( original data). Unique identifiers were then removed and data were re-linked by date of birth, sex and Soundex using two classification methods: i) HW classification - accepting the candidate record with the highest weight exceeding a threshold and ii) PII-imputing values from a match probability distribution. To evaluate methods for linking data with different error rates, non-random error and different match rates, we generated simulation data. Each set of simulated files was linked using both classification methods. Infection rates in the linked data were compared with those in the gold-standard data. Results: In the original gold-standard data, 1496/20924 admissions linked to an infection. In the linked original data, PII provided least biased results: 1481 and 1457 infections (upper/lower thresholds) compared with 1316 and 1287 (HW upper/lower thresholds). In the simulated data, substantial bias (up to 112%) was introduced when linkage error varied by hospital. Bias was also greater when the match rate was low or the identifier error rate was high and in these cases, PII performed better than HW classification at reducing bias due to false-matches. Conclusions: This study highlights the importance of evaluating the potential impact of linkage error on results. PII can help incorporate linkage uncertainty into analysis and reduce bias due to linkage error, without requiring identifiers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据