4.7 Article

Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps

Journal

SCIENTIFIC REPORTS
Volume 10, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41598-020-60598-y

Keywords

-

Funding

  1. University of Washington Bothell [74-0525]
  2. NIH [P41-GM103311]
  3. Graduate Research Award of Computing and Software Systems division of the University of Washington Bothell

Ask authors/readers for more resources

Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 angstrom resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Ca atoms along a protein's backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein's structure. This model predicts secondary structure elements (SSEs), backbone structure, and Ca atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Ca placements. A helix-refinement algorithm made further improvements to the a-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Ca traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 angstrom and 4.4 angstrom resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Ca atoms. This method accurately predicted 88.9% (mean) of the Ca atoms within 3 A of a protein's backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 angstrom on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available