☆ 4.7 Article

S1000: a better taxonomic name corpus for biomedical information extraction

BIOINFORMATICS (2023)

Journal

BIOINFORMATICS

Volume 39, Issue 6, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btad369

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The recognition of species names in text is crucial for biomedical text mining, but current methods, including deep learning, have poor performance. This study introduces the S1000 corpus, which greatly improves the accuracy of species name recognition (F-score = 93.1%) for both deep learning and dictionary-based approaches.

Motivation The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora.Results We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods.Availability and implementationAll resources introduced in this study are available under open licenses from . The webpage contains links to a Zenodo project and three GitHub repositories associated with the study.

S1000: a better taxonomic name corpus for biomedical information extraction

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

S1000: a better taxonomic name corpus for biomedical information extraction

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper