☆ 4.7 Article

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

CRYSTENGCOMM (2019)

Journal

CRYSTENGCOMM

Volume 21, Issue 8, Pages 1215-1223

Publisher

ROYAL SOC CHEMISTRY

DOI: 10.1039/c8ce01589a

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In the current report, three machine learning approaches were assessed for their ability to predict the crystallization propensities of a set of small organic compounds (<709 Da). The algorithms evaluated included: random forest regression (RFR), support vector machine regression (SVMR) and neural networks (NN). In addition to these algorithms, the influence of different molecular descriptors, the size of the training sets used, and various experimental factors on the predictive ability of the methods were also taken into consideration. For example, factors such as the solvent used, presence of impurities and/or degradants, influence of potential seeded crystallizations and implied supersaturation levels were explicitly investigated. For smaller training set sizes (e.g., similar to 50), very little difference in the accuracy of the three algorithms was observed. However, beyond training set sizes of 150, the RFR algorithm typically outperformed the others by up to 20% RMSE. Additionally, as a result of the improved performance with larger training set sizes, the RFR models built with the explicit treatment of solvent typically outperformed models only considering the active pharmaceutical ingredient (API). For example, the best performing API only model had an RMSE of 30% whereas for the API + solvent models the RMSE was found to be 20%. Beyond inclusion of the solvent, it was found that the presence of impurities and/or degradants had the greatest influence on model accuracy. When these experiments were excluded, an additional improvement of up to 10% RMSE was observed in some cases.

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Journal

CRYSTENGCOMM

Publisher

ROYAL SOC CHEMISTRY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Assessment of machine learning approaches for predicting the crystallization propensity of active pharmaceutical ingredients

Journal

CRYSTENGCOMM

Publisher

ROYAL SOC CHEMISTRY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper