4.7 Article

Exploring Species-Based Strategies for Gene Normalization

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2010.48

关键词

Biomedical natural language processing; information extraction; gene normalization; text mining

资金

  1. NIH [5R01LM009254, 2R01LM008111, 1R01LM010120-01]

向作者/读者索取更多资源

We introduce a system developed for the BioCreativeII.5 community evaluation of information extraction of proteins and protein interactions. The paper focuses primarily on the gene normalization task of recognizing protein mentions in text and mapping them to the appropriate database identifiers based on contextual clues. We outline a fuzzy dictionary lookup approach to protein mention detection that matches regularized text to similarly regularized dictionary entries. We describe several different strategies for gene normalization that focus on species or organism mentions in the text, both globally throughout the document and locally in the immediate vicinity of a protein mention, and present the results of experimentation with a series of system variations that explore the effectiveness of the various normalization strategies, as well as the role of external knowledge sources. While our system was neither the best nor the worst performing system in the evaluation, the gene normalization strategies show promise and the system affords the opportunity to explore some of the variables affecting performance on the BCII.5 tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据