4.6 Article

Impact of phylogeny on structural contact inference from protein sequence data

Journal

JOURNAL OF THE ROYAL SOCIETY INTERFACE
Volume 20, Issue 199, Pages -

Publisher

ROYAL SOC
DOI: 10.1098/rsif.2022.0707

Keywords

protein sequences; inference; contact prediction; phylogeny; modelling; data analysis

Ask authors/readers for more resources

This study investigates the impact of phylogenetic correlations on contact prediction from protein sequences. The results show that global inference methods are more resilient to these correlations than local methods, which explains their success.
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available