☆ 4.7 Article

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2020)

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Volume 60, Issue 4, Pages 1955-1968

Publisher

AMER CHEMICAL SOC

DOI: 10.1021/acs.jcim.9b01053

Keywords

Funding

Lawrence Livermore National Laboratory
National Nuclear Security Administration
GlaxoSmithKline, LLC
National Cancer Institute, National Institutes of Health
Department of Health and Human Services [75N91019D00024]
U.S. Department of Energy [DE-AC52-07NA27344]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL.

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper