Journal
JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 49, Issue 5, Pages 1193-1201Publisher
AMER CHEMICAL SOC
DOI: 10.1021/ci8004644
Keywords
-
Categories
Funding
- University of Sheffield through the Jacques-Emile Dubois Grant
- Tripos Inc.
Ask authors/readers for more resources
Several recent studies have compared the relative performance of a selection of similarity coefficients when applied to chemical databases represented by binary fingerprints. Considerable variation in performance, when used for (dis)similarity-based techniques, such as similarity searching, database clustering, and dissimilarity-based compound selection, has been reported, the reasons for which are closely related to molecular size. For many of these similarity coefficients, an alternative form can be derived which is applicable to sets of nonbinary data, such as calculated or measured physicochemical properties, or counts of substructural fragments. Here we report on several studies which have been undertaken to investigate the relative performance of twelve coefficients when applied to nonbinary data using such (dis)similarity-based techniques. Results suggest that no single coefficient is appropriate for all methodologies investigated and that the size bias detected with binary data is not as apparent when the data and, hence, coefficient are nonbinary in nature.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available