4.1 Article

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct

Journal

JOURNAL OF BIOMEDICAL SEMANTICS
Volume 6, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s13326-015-0006-4

Keywords

Text mining; Protein function prediction; Biomedical concept recognition

Funding

  1. NIH [2T15LM009451]
  2. NSF [DBI-0965616, DBI-0965768]
  3. NICTA - Australian Government
  4. Australian Research Council through the ICT Centre of Excellence program
  5. Direct For Biological Sciences [0965616] Funding Source: National Science Foundation
  6. Div Of Biological Infrastructure [0965768] Funding Source: National Science Foundation

Ask authors/readers for more resources

Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function = 0.408, Biological Process = 0.461, Cellular Component = 0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a medium-throughput pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available