4.7 Article

QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 54, Issue 3, Pages 705-712

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/ci400737s

Keywords

-

Funding

  1. Frederick National Laboratory for Cancer Research, National Institutes of Health [HHSN261200800001E]

Ask authors/readers for more resources

Many of the structures in Pub Chem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced Pub Chem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and biological descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available