4.8 Article

Direct Prediction of Bioaccumulation of Organic Contaminants in Plant Roots from Soils with Machine Learning Models Based on Molecular Structures

Journal

ENVIRONMENTAL SCIENCE & TECHNOLOGY
Volume 55, Issue 24, Pages 16358-16368

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.est.1c02376

Keywords

machine learning; root concentration factor (RCF); plant uptake; extended connectivity fingerprints (ECFP); molecular structure; gradient boosting regression tree (GBRT)

Funding

  1. National Key Research and Development Program of China [2020YFC1806801, 2019YFC1604503, 2016YFD0800403]

Ask authors/readers for more resources

The study developed a machine learning model to predict RCF values based on a large dataset, revealing nonlinear relationships among chemical, soil, and plant properties and identifying key chemical substructures related to RCF.
Root concentration factor (RCF) is an important characterization parameter to describe accumulation of organic contaminants in plants from soils in life cycle impact assessment (LCIA) and phytoremediation potential assessment. However, building robust predictive models remains challenging due to the complex interactions among chemical-soil-plant root systems. Here we developed end-to-end machine learning models to devolve the complex molecular structure relationship with RCF by training on a unified RCF data set with 341 data points covering 72 chemicals. We demonstrate the efficacy of the proposed gradient boosting regression tree (GBRT) model based on the extended connectivity fingerprints (ECFP) by predicting RCF values and achieved prediction performance with R-squared of 0.77 and mean absolute error (MAE) of 0.22 using 5-fold cross validation. In addition, our results reveal nonlinear relationships among properties of chemical, soil, and plant. Further in-depth analyses identify the key chemical topological substructures (e.g., -O, -Cl, aromatic rings and large conjugated pi-systems) related to RCF. Stemming from its simplicity and universality, the GBRT-ECFP model provides a valuable tool for LCIA and other environmental assessments to better characterize chemical risks to human health and ecosystems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available