期刊
JOURNAL OF BIOMEDICAL INFORMATICS
卷 58, 期 -, 页码 S39-S46出版社
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2015.08.012
关键词
Protected health information; De-identification; Medical records; Conditional random fields
资金
- NIH NLM [2U54LM008748, 5R13LM011411]
- NIH NIGMS [5R01GM102282]
De-identification is a shared task of the 2014 i2b2/UTHealth challenge. The purpose of this task is to remove protected health information (PHI) from medical records. In this paper, we propose a novel de-identifier, WI-deld, based on conditional random fields (CRFs). A preprocessing module, which tokenizes the medical records using regular expressions and an off-the-shelf tokenizer, is introduced, and three groups of features are extracted to train the de-identifier model. The experiment shows that our system is effective in the de-identification of medical records, achieving a micro-Fl of 0.9232 at the i2b2 strict entity evaluation level. (C) 2015 Elsevier Inc. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据