4.6 Article

Improving automatic GO annotation with semantic similarity

期刊

BMC BIOINFORMATICS
卷 23, 期 SUPPL 2, 页码 -

出版社

BMC
DOI: 10.1186/s12859-022-04958-7

关键词

Protein function annotation; Domain similarity network; Gene ontology annotation; Label propagation; Semantic similarity; GrAPFI; K-nearest neighbor

资金

  1. FIGHT-HF project [PRC2243, ANR-15-RHUS-0004]
  2. FEDER-Contrat de Plan Etat Region Sante-IT2MP
  3. University of Lorraine, France
  4. Department of Computer Science and Engineering in Khulna University of Engineering & Technology, Khulna, Bangladesh
  5. Khulna University of Engineering & Technology, Khulna, Bangladesh
  6. Inria research center Nancy Grand-Est

向作者/读者索取更多资源

This paper introduces the GrAPFI-GO method, which enhances the automatic annotation performance of protein functions by integrating semantic similarity and hierarchical relations between GO terms in a graph. Experimental results suggest that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and other annotation tools as well.
BackgroundAutomatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problemResultsIn this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure.ConclusionOur results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据