4.7 Article

Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 57, 期 4, 页码 1007-1012

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.7b00049

关键词

-

资金

  1. National Natural Science Foundation of China [11501306, 81520108019]
  2. Ministry of Science and Technology 973 Project [2014CB542800]
  3. Thousand Youth Talents Plan of China

向作者/读者索取更多资源

The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set Of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based, methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence Similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据