4.5 Article

Text mining for the biocuration workflow

出版社

OXFORD UNIV PRESS
DOI: 10.1093/database/bas020

关键词

-

资金

  1. National Science Foundation [IIS-0844419, DBI-0849977]
  2. US National Institutes of Health National Library of Medicine [1G08LM10720-01]
  3. US National Science Foundation [DBI-0850319, DBI-0850219]
  4. US National Institute of General Medical Sciences [R01-GM083871]
  5. European Union [222886-2]
  6. US National Science Foundation IGERT [0221625]
  7. PhRMA Foundation
  8. US National Human Genome Research Institute [HG001315]
  9. National Institutes of Health (NIH) [2U01HG02712-04]
  10. European Commission [021902RII3, 2007-223411]
  11. National Institute of Environmental Health Sciences (NIEHS) [R01ES014065-04S1]
  12. National Library of Medicine (NLM) [R01ES014065]
  13. National Institutes of Health National Center for Research Resources [P20RR016463, 1R01RR024031]
  14. Biotechnology and Biological Sciences Research Council of the UK [BB/F010486/1]
  15. MITRE Corporation
  16. BBSRC [BB/F010486/1] Funding Source: UKRI
  17. Biotechnology and Biological Sciences Research Council [BB/F010486/1] Funding Source: researchfish
  18. Division Of Graduate Education
  19. Direct For Education and Human Resources [0221625] Funding Source: National Science Foundation
  20. Div Of Biological Infrastructure
  21. Direct For Biological Sciences [0850319, 0850219, 0849977] Funding Source: National Science Foundation

向作者/读者索取更多资源

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e. g. genes); and (iii) detailed curation of specific relations (e. g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据