4.6 Article

LocText: relation extraction of protein localizations to assist database curation

期刊

BMC BIOINFORMATICS
卷 19, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12859-018-2021-9

关键词

Relation extraction; Text mining; Protein; Subcellular localization; GO; Annotations; Database curation

资金

  1. Alexander von Humboldt Foundation through German Federal Ministry for Education and Research
  2. Novo Nordisk Foundation Center for Protein Research [NNF14CC0001]
  3. German Research Foundation (DFG)
  4. Technical University of Munich (TUM)

向作者/读者索取更多资源

Background: The subcellular localization of a protein is an important aspect of its function. However, the experimental annotation of locations is not even complete for well-studied model organisms. Text mining might aid database curators to add experimental annotations from the scientific literature. Existing extraction methods have difficulties to distinguish relationships between proteins and cellular locations co-mentioned in the same sentence. Results: LocText was created as a new method to extract protein locations from abstracts and full texts. LocText learned patterns from syntax parse trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86% +/- 4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast (Saccharomyces cerevisiae), and thale cress (Arabidopsis thaliana). Examining 60 novel, text-mined annotations, we found that 65% (human), 85% (yeast), and 80% (cress) were correct. Of all validated annotations, 40% were completely novel, i.e. did neither appear in the annotations nor the text descriptions of Swiss-Prot. Conclusions: LocText provides a cost-effective, semi-automated workflow to assist database curators in identifying novel protein localization annotations. The annotations suggested through text-mining would be verified by experts to guarantee high-quality standards of manually-curated databases such as Swiss-Prot.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据