4.6 Article

Exploring the boundaries: gene and protein identification in biomedical text

Journal

BMC BIOINFORMATICS
Volume 6, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-6-S1-S5

Keywords

-

Ask authors/readers for more resources

Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the open evaluation and a precision of 0.78 and recall of 0.85 in the closed evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available