☆ 3.8 Article

Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES (2004)

Journal

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

Volume 44, Issue 6, Pages 1912-1928

Publisher

AMER CHEMICAL SOC

DOI: 10.1021/ci049782w

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

How well can a QSAR model predict the activity of a molecule not in the training set used to create the model? A set of retrospective cross-validation experiments using 20 diverse in-house activity sets were done to find a good discriminator of prediction accuracy Lis measured by root-mean-square difference between observed and predicted activity. Among the measures we tested, two seem useful: the similarity of the molecule to be predicted to the nearest molecule in the training set and/or the number of neighbors in the training set, where neighbors are those more similar than a user-chosen cutoff. The molecules with the highest similarity and/or the most neighbors are the best-predicted. This trend holds true for narrow training sets and, to a lesser degree, for many diverse training sets and does not depend on which QSAR method or descriptor is used. One may define the similarity using a different descriptor than that used for the QSAR model. The similarity dependence for diverse training sets is somewhat unexpected. It appears to be greater for those data sets where the association of similar activities vs similar structures (as encoded in the Patterson plot) is stronger. We propose a way to estimate the reliability of the prediction of an arbitrary chemical structure on a given QSAR model, given the training set from which the model was derived.

Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR

Journal

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR

Journal

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper