☆ 4.6 Article

Improving the performance of dictionary-based approaches in protein name recognition

JOURNAL OF BIOMEDICAL INFORMATICS (2004)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 37, 期 6, 页码 461-470

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2004.08.003

关键词

protein name recognition; naive Bayes classifier; approximate string search; spelling variant generator

类别

Computer Science, Interdisciplinary Applications Medical Informatics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Dictionary-based protein name recognition is often a first step in extracting information from biomedical documents because it can provide ID information on recognized terms. However, dictionary-based approaches present two fundamental difficulties: (1) false recognition mainly caused by short names; (2) low recall due to spelling variations. In this paper, we tackle the former problem using machine learning to filter out false positives and present two alternative methods for alleviating the latter problem of spelling variations. The first is achieved by using approximate string searching, and the second by expanding the dictionary with a probabilistic variant generator, which we propose in this paper. Experimental results using the GENIA corpus revealed that filtering using a naive Bayes classifier greatly improved precision with only a slight loss of recall, resulting in 10.8% improvement in F-measure, and dictionary expansion with the variant generator gave further 1.6% improvement and achieved an F-measure of 66.6%. (C) 2004 Elsevier Inc. All rights reserved.

Improving the performance of dictionary-based approaches in protein name recognition

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving the performance of dictionary-based approaches in protein name recognition

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文