4.7 Article

Using discordance to improve classification in narrative clinical databases: An application to community-acquired pneumonia

期刊

COMPUTERS IN BIOLOGY AND MEDICINE
卷 37, 期 3, 页码 296-304

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2006.02.001

关键词

data mining; natural language processing; similarity metrics; electronic medical records; classification; community-acquired pneumonia; discordance

资金

  1. NLM NIH HHS [R01 LM06910, R01 LM006910] Funding Source: Medline

向作者/读者索取更多资源

Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia-in which the ICD9-CM coding was found to be very poor-the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62). (c) 2006 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据