☆ 4.6 Article

Protein function prediction by massive integration of evolutionary analyses and multiple data sources

BMC BIOINFORMATICS (2013)

Journal

BMC BIOINFORMATICS

Volume 14, Issue -, Pages -

Publisher

BMC

DOI: 10.1186/1471-2105-14-S3-S1

Keywords

Funding

UK Biotechnology and Biological Sciences Research Council (DWAB)
Marie Curie Intra European Fellowship within the 7th European Community Framework Programme [PIEF-GA-2009-237292]
Biotechnology and Biological Sciences Research Council [BB/I023992/1, BB/J002925/1] Funding Source: researchfish
BBSRC [BB/J002925/1, BB/I023992/1] Funding Source: UKRI

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background: Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt. Methods: Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure. Results: We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the naive predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments. Conclusions: Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of prediction accuracy, in fostering advancements and new ideas, and ultimately in recording progress.

Protein function prediction by massive integration of evolutionary analyses and multiple data sources

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Protein function prediction by massive integration of evolutionary analyses and multiple data sources

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper