4.7 Article

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Journal

SCIENTIFIC REPORTS
Volume 11, Issue 1, Pages -

Publisher

NATURE RESEARCH
DOI: 10.1038/s41598-021-92395-6

Keywords

-

Funding

  1. National Science Foundation [DBI1759934, IIS1763246]
  2. National Institutes of Health [GM093123]
  3. Department of Energy, USA [DE-AR0001213, DE-SC0020400, DE-SC0021303]
  4. U.S. Department of Energy (DOE) [DE-SC0020400, DE-SC0021303] Funding Source: U.S. Department of Energy (DOE)

Ask authors/readers for more resources

This paper introduces the latest open-source protein tertiary structure prediction system MULTICOM2, which integrates template-based modeling and template-free modeling methods, capable of predicting good tertiary structures across the board. The template-free modeling method's prediction accuracy on TBM and FM targets is very close to the combination of template-based and template-free modeling methods, demonstrating that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets.
Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system-MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available