☆ 4.8 Article

Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

ENVIRONMENTAL SCIENCE & TECHNOLOGY (2023)

期刊

ENVIRONMENTAL SCIENCE & TECHNOLOGY

卷 57, 期 8, 页码 3434-3444

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.est.2c04945

关键词

life cycle assessment (LCA); machine learning; data processing; feature selection; weighted Euclidean distance

类别

Engineering, Environmental Environmental Sciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

To improve the prediction accuracy and interpretability of the life-cycle environmental impacts of chemicals, we utilized the mutual information-permutation importance (MI-PI) feature selection method for data processing and applied a weighted Euclidean distance method for data mining. Based on these data processing techniques, artificial neural network (ANN) models were developed to predict the environmental impacts of chemicals.

Machine learning (ML) provides an efficient manner for rapid prediction of the life-cycle environmental impacts of chemicals, but challenges remain due to low prediction accuracy and poor interpretability of the models. To address these issues, we focused on data processing by using a mutual information-permutation importance (MI-PI) feature selection method to filter out irrelevant molecular descriptors from the input data, which improved the model interpretability by preserving the physicochemical meanings of original molecular descriptors without generation of new variables. We also applied a weighted Euclidean distance method to mine the data most relevant to the predicted targets by quantifying the contribution of each feature, thereby the prediction accuracy was improved. On the basis of above data processing, we developed artificial neural network (ANN) models for predicting the life-cycle environmental impacts of chemicals with R2 values of 0.81, 0.81, 0.84, 0.75, 0.73, and 0.86 for global warming, human health, metal depletion, freshwater ecotoxicity, particulate matter formation, and terrestrial acidification, respectively. The ML models were interpreted using the Shapley additive explanation method by quantifying the contribution of each input molecular descriptor to environmental impact categories. This work suggests that the combination of feature selection by MI-PI and source data selection based on weighted Euclidean distance has a promising potential to improve the accuracy and interpretability of the models for predicting the life-cycle environmental impacts of chemicals.

Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

期刊

ENVIRONMENTAL SCIENCE & TECHNOLOGY

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improved Machine Learning Models by Data Processing for Predicting Life-Cycle Environmental Impacts of Chemicals

期刊

ENVIRONMENTAL SCIENCE & TECHNOLOGY

出版社

AMER CHEMICAL SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文