4.7 Article Proceedings Paper

Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 55, Issue 6, Pages 1231-1245

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.5b00143

Keywords

-

Funding

  1. NIH National Center for Advancing Translational Sciences [9R44TR000942-02]
  2. National Institutes of Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID) [R41-AI108003-01]
  3. Bill and Melinda Gates Foundation [49852]
  4. American Reinvestment and Recovery Act (NIH, NIAID) [1RC1AI086677-01]
  5. Rutgers University

Ask authors/readers for more resources

On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available