4.7 Article

PROST: AlphaFold2-aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 17, 页码 4270-4282

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c00799

关键词

-

资金

  1. Major Inter-Disciplinary Research (IDR) Grant - Monash University

向作者/读者索取更多资源

PROST is a sequence-based predictor for protein stability changes caused by single-point missense mutations. It utilizes various sequence-based features, physicochemical properties, evolutionary information, and predicted structural features to accurately predict the protein stability changes. The performance of PROST is evaluated on multiple datasets and compared with state-of-the-art predictors, demonstrating its superiority.
An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for the protein stability (PROST) change (Gibb's free energy change, Delta Delta G) upon a single-point missense mutation. PROST extracts multiple descriptors from the most promising sequence-based predictors, such as BoostDDG, SAAFEC-SEQ and DDGun. RPOST also extracts descriptors from iFeature and AlphaFold2. The extracted descriptors include sequence-based features, physicochemical properties, evolutionary information, evolutionary-based physicochemical properties, and predicted structural features. The PROST predictor is a weighted average ensemble model based on extreme gradient boosting (XGBoost) decision trees and an extra-trees regressor; PROST is trained on both direct and hypothetical reverse mutations using the S5294 (S2647 direct mutations + S2647 inverse mutations). The parameters for the PROST model are optimized using grid searching with 5-fold cross-validation, and feature importance analysis unveils the most relevant features. The performance of PROST is evaluated in a blinded manner, employing nine distinct data sets and existing state-of-the-art sequence-based and structure-based predictors. This method consistently performs well on frataxin, S217, 5349, Ssym, 5669, Myoglobin, and CAGIS data sets in blind tests and similarly to the state-of-the-art predictors for p53 and 5276 data sets. When the performance of PROST is compared with the latest predictors such as BoostDDG, SAAFEC-SEQ ACDC-NN-seq, and DDGun, PROST dominates these predictors. A case study of mutation scanning of the frataxin protein for nine wild-type residues demonstrates the utility of PROST. Taken together, these findings indicate that PROST is a well-suited predictor when no protein structural information is available. The source code of PROST, data sets, examples, and pretrained models along with how to use PROST are available at https://github.com/ShahidIqb/PROST and https://prost.erc.monash.edu /seq.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据