4.7 Article

PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling

Journal

BIOINFORMATICS
Volume 37, Issue 1, Pages 17-22

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa1105

Keywords

-

Funding

  1. Natural Sciences and Engineering Research Council of Canada (NSERC) [RGPIN-201904266, RGPAS-2019-00004]

Ask authors/readers for more resources

PhyloCorrelate is a computational framework for gene co-occurrence analysis that combines various co-occurrence metrics to optimize the analysis of gene associations in large phylogenomic databases, enabling gene function prediction.
Motivation: Statistical detection of co-occurring genes across genomes, known as 'phylogenetic profiling', is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation and substantial computational requirements. Results: We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154 217 052 comparisons for 28 315 genes across 27 372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM or KEGG query. In total, PhyloCorrelate detected 29 762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available