☆ 4.6 Article

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2012)

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

卷 19, 期 5, 页码 867-874

出版社

OXFORD UNIV PRESS

DOI: 10.1136/amiajnl-2011-000766

关键词

类别

Computer Science, Information Systems Computer Science, Interdisciplinary Applications Health Care Sciences & Services Information Science & Library Science Medical Informatics

资金

National Science Foundation [ABI:0845523]
National Institute of Health [R01LM009959A1]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity. Materials and methods The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve. Results The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B-3, MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set. Discussion A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts. Conclusion Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref.

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文