☆ 4.4 Article

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN (2002)

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

卷 16, 期 5-6, 页码 357-369

出版社

SPRINGER

DOI: 10.1023/A:1020869118689

关键词

类别

Biochemistry & Molecular Biology Biophysics Computer Science, Interdisciplinary Applications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

One of the most important characteristics of Quantitative Structure Activity Relashionships ( QSAR) models is their predictive power. The latter can be defined as the ability of a mode to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文