☆ 4.6 Article

Domain-specific language models and lexicons for tagging

JOURNAL OF BIOMEDICAL INFORMATICS (2005)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 38, 期 6, 页码 422-430

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2005.02.009

关键词

clinical report analysis.; part-of-speech tagging accuracy; domain adaptation; clinical information systems; biomedical domain; corpus linguistics; statistical part-of-speech tagging; hidden Markov model

类别

Computer Science, Interdisciplinary Applications Medical Informatics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Accurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost. (c) 2005 Elsevier Inc. All rights reserved.

Domain-specific language models and lexicons for tagging

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Domain-specific language models and lexicons for tagging

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文