4.7 Article

Multiple machine learning algorithms assisted QSPR models for aqueous solubility: Comprehensive assessment with CRITIC-TOPSIS

Journal

SCIENCE OF THE TOTAL ENVIRONMENT
Volume 857, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.scitotenv.2022.159448

Keywords

Aqueous solubility; Machine learning; Descriptor screening methods; Comprehensive evaluation; Organic contaminants

Ask authors/readers for more resources

Aqueous solubility is an important environmental property that can be used to assess the hydrophobicity, ecological risk, and toxicity of organic pollutants. However, there is a lack of standard procedures for evaluating prediction models. In this study, the CRITIC-TOPSIS comprehensive assessment method was proposed and applied to evaluate 39 models developed using different algorithms and descriptor screening methods. The MLR-1, XGB-1, DNN-1, and kNN-1 models showed better predictive accuracy and external competitiveness compared to other models in each group. The XGB model based on SRM (XGB-1, C=0.599) was selected as the optimal pathway for predicting aqueous solubility.
As an essential environmental property, the aqueous solubility quantifies the hydrophobicity of a compound. It could be further utilized to evaluate the ecological risk and toxicity of organic pollutants. Concerned about the proliferation of organic contaminants in water and the associated technical burden, researchers have developed QSPR models to predict aqueous solubility. However, there are no standard procedures or best practices on how to comprehensively evaluate models. Hence, the CRITIC-TOPSIS comprehensive assessment method was first-ever proposed according to a variety of statistical parameters in the environmental model research field. 39 models based on 13 ML algorithms (belonged to 4 tribes) and 3 descriptor screening methods, were developed to calculate aqueous solubility values (log Kws) for organic chemicals reliably and verify the effectiveness of the comprehensive assessment method. The evalu-ations were carried out for exhibiting better predictive accuracy and external competitiveness of the MLR-1, XGB-1, DNN-1, and kNN-1 models in contrast to other prediction models in each tribe. Further, XGB model based on SRM (XGB-1, C = 0.599) was selected as an optimal pathway for prediction of aqueous solubility. We hope that the pro-posed comprehensive evaluation approach could act as a promising tool for selecting the optimum environmental property prediction methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available