4.6 Article

Improving automatic GO annotation with semantic similarity

Journal

BMC BIOINFORMATICS
Volume 23, Issue SUPPL 2, Pages -

Publisher

BMC
DOI: 10.1186/s12859-022-04958-7

Keywords

Protein function annotation; Domain similarity network; Gene ontology annotation; Label propagation; Semantic similarity; GrAPFI; K-nearest neighbor

Funding

  1. FIGHT-HF project [PRC2243, ANR-15-RHUS-0004]
  2. FEDER-Contrat de Plan Etat Region Sante-IT2MP
  3. University of Lorraine, France
  4. Department of Computer Science and Engineering in Khulna University of Engineering & Technology, Khulna, Bangladesh
  5. Khulna University of Engineering & Technology, Khulna, Bangladesh
  6. Inria research center Nancy Grand-Est

Ask authors/readers for more resources

This paper introduces the GrAPFI-GO method, which enhances the automatic annotation performance of protein functions by integrating semantic similarity and hierarchical relations between GO terms in a graph. Experimental results suggest that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and other annotation tools as well.
BackgroundAutomatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problemResultsIn this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure.ConclusionOur results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available