4.5 Article

Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers

Journal

MOLECULAR DIVERSITY
Volume 10, Issue 3, Pages 389-403

Publisher

SPRINGER
DOI: 10.1007/s11030-006-9033-5

Keywords

chemical databases; chemoinformatics; diversity; drug-like; lead-like; screening

Ask authors/readers for more resources

The data for 3.8 million compounds from structural databases of 32 providers were gathered and stored in a single chemical database. Duplicates are removed using the IUPAC International Chemical Identifier. After this, 2.6 million compounds remain. Each database and the final one were studied in term of uniqueness, diversity, frameworks, 'drug-like' and 'lead-like' properties. This study also shows that there are more than 97 000 frameworks in the database. It contains 2.1 million 'drug-like' molecules among which, more than one million are 'lead-like'. This study has been carried out using 'ScreeningAssistant', a software dedicated to chemical databases management and screening sets generation. Compounds are stored in a MySQL database and all the operations on this database are carried out by Java code. The druglikeness and leadlikeness are estimated with 'in-house' scores using functions to estimate convenience to properties; unicity using the InChI code and diversity using molecular frameworks and fingerprints. The software has been conceived in order to facilitate the update of the database. 'ScreeningAssistant' is freely available under the GPL license.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available