4.4 Article

Determination of bioavailable arsenic threshold and validation of modeled permissible total arsenic in paddy soil using machine learning

Journal

JOURNAL OF ENVIRONMENTAL QUALITY
Volume 52, Issue 2, Pages 315-327

Publisher

WILEY
DOI: 10.1002/jeq2.20452

Keywords

-

Ask authors/readers for more resources

Minimizing arsenic intake from food consumption is crucial in arsenic-contaminated regions, especially where rice is the main staple food. This study developed models to predict the maximum allowable concentration of arsenic in soil and bioavailable arsenic in paddy soil. The decision tree (DT) model performed better in predicting the maximum allowable total arsenic in soil, while the logistic regression (LR) model performed better in predicting the concentration of bioavailable arsenic in paddy soil.
Minimizing arsenic intake from food consumption is a key aspect of the public health response in arsenic (As)-contaminated regions. In many of these regions, rice is the predominant staple food. Here, we present a validated maximum allowable concentration of total As in paddy soil and provide the first derivation of a maximum allowable soil concentration for bioavailable As. We have previously used meta-analysis to predict the maximum allowable total As in soil based on decision tree (DT) and logistic regression (LR) models. The models were defined using the maximum tolerable concentration (MTC) of As in rice grains as per the codex recommendation. In the present study, we validated these models using three test data sets derived from purposely collected field data. The DT model performed better than the LR in terms of accuracy and Matthews correlation coefficient (MCC). Therefore, the DT estimated maximum allowable total As in paddy soil of 14 mg kg(-1) could confidently be used as an appropriate guideline value. We further used the purposely collected field data to predict the concentration of bioavailable As in the paddy soil with the help of random forest (RF), gradient boosting machine (GBM), and LR models. The category of grain As (MTC) was considered as the dependent variable; bioavailable As (BAs), total As (TAs), pH, organic carbon (OC), available phosphorus (AvP), and available iron (AvFe) were the predictor variables. LR performed better than RF and GBM in terms of accuracy, sensitivity, specificity, kappa, precision, log loss, F1score, and MCC. From the better-performing LR model, bioavailable As (BAs), TAs, AvFe, and OC were significant variables for grain As. From the partial dependence plots (PDP) and individual conditional expectation (ICE) of the LR model, 5.70 mg kg(-1) was estimated to be the limit for BAs in soil.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available