☆ 4.4 Article

Entity recognition from clinical texts via recurrent neural network

BMC MEDICAL INFORMATICS AND DECISION MAKING (2017)

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

卷 17, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/s12911-017-0468-7

关键词

Entity recognition; Recurrent neural network; Clinical notes; Deep learning; Sequence labeling

类别

Medical Informatics

资金

National 863 Program of China [2015AA015405]
NSFCs (National natural Science Foundations of China) [61573118, 61402128, 61473101, 61472428]
Strategic Emerging Industry Development Special Funds of Shenzhen [JCYJ20140508161040764, JCYJ20140417172417105, JCYJ20140627163809422, JSGG20151015161015297]
JSGG20151015161015297), Innovation Fund of Harbin Institute of Technology (HIT. NSRIF) [2017052]
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education [93K172016K12]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts. Methods: In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer - generates representation of each word of a sentence; LSTM layer outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer - makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence. Results: Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems. Conclusions: LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.

Entity recognition from clinical texts via recurrent neural network

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Entity recognition from clinical texts via recurrent neural network

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文