4.8 Article

Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.2113118119

Keywords

SARS-CoV-2; mutability; data-driven models; epistasis; direct coupling analysis

Funding

  1. Faculty of Science and Engineering of Sorbonne University
  2. EU H2020 Research and Innovation Programme MSCA-RISE-2016 [734439]
  3. EU H2020 Marie Sklodowska Curie Individual Fellowship (H2020-MSCA-IF-2020) [101027973]
  4. Marie Curie Actions (MSCA) [101027973] Funding Source: Marie Curie Actions (MSCA)

Ask authors/readers for more resources

This study predicts the mutability of SARS-CoV-2 protein domains to forecast the appearance of unseen variants. The statistical models built based on sequence data from other coronaviruses show superior performance in estimating the variability of SARS-CoV-2. The model also demonstrates a good agreement with the observed variability over time and could assist in studying viral evolution and future viral outbreaks.
The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis. We show that these models are notably superior to conservation profiles in estimating the already observable SARS-CoV-2 variability. In the receptor binding domain of the spike protein, we observe that the predicted mutability correlates well with experimental measures of protein stability and that both are reliable mutability predictors (receiver operating characteristic areas under the curve similar to 0.8). Most interestingly, we observe an increasing agreement between our model and the observed variability as more data become available over time, proving the anticipatory capacity of our model. When combined with data concerning the immune response, our approach identifies positions where current variants of concern are highly overrepresented. These results could assist studies on viral evolution and future viral outbreaks and, in particular, guide the exploration and anticipation of potentially harmful future SARS-CoV-2 variants.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available