4.7 Article

Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 23, Issue 4, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac187

Keywords

protein structure modeling; mutation; AlphaFold; RoseTTAFold; disease-associated; functional site

Funding

  1. Biotechnology and Biological Sciences Research Council [BB/S020144/1, BB/R009597/1, BB/R014892/1, BB/S017135/1]
  2. National Science Foundation Award [DBI1937533]
  3. Audacious Project at the Institute for Protein Design

Ask authors/readers for more resources

Mutations in human proteins can cause diseases. The structure of these proteins helps in understanding disease mechanisms and developing therapeutics. Advanced deep learning techniques allow us to predict protein structures even without structural homologs. By modeling and extracting domains, we predicted functional sites and analyzed disease-associated mutations. We found that a significant number of mutations were near functional sites, caused structural destabilization, and predicted to be pathogenic. Both the RoseTTAFold and AlphaFold models provided confidence in our predictions, and the combination of these models explained additional mutations.
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available