4.7 Article

Multiple-layer statistical methodology for developing data-driven models of anaerobic digestion process

Journal

JOURNAL OF ENVIRONMENTAL MANAGEMENT
Volume 347, Issue -, Pages -

Publisher

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1016/j.jenvman.2023.119153

Keywords

Anaerobic digestion; Biogas; Data-driven model; Multiple linear regression; Statistics

Ask authors/readers for more resources

In this study, a multilayer statistical technique using regression models was employed to support the development of anaerobic digestion models. Experimental data from lab-scale, pilot-scale, and full-scale reactors were used to demonstrate the modelling process. The developed models showed high accuracy in predicting biogas production during anaerobic digestion.
When modelling anaerobic digestion, ineffective data handling and inadequate designation of modelling parameters can undermine the model reliability. In this study, a multilayer statistical technique, which employed a machine learning technique using regression models, was introduced to systematically support the development of anaerobic digestion models. Layer-by-layer statistical techniques including cubic smoothing splines (missing data reconstruction), principal component analysis (identifying correlated parameters), analysis of variance (analysing differences among datasets), and linear regression (developing data-driven models) were used to develop and validate anaerobic digestion models. Experimental data collected from the long-term operation of lab-scale (operated for 350 days), pilot-scale (operated for 150 days), and full-scale reactors (operated for 750 days) were used to demonstrate the modelling process. The multivariate models based on a data-driven modelling technique were developed by subjecting the experimental and monitored data to a modelling process. The developed models could predict the biogas production and effluent chemical oxygen demand during anaerobic digestion. Statistical analyses verified the modelling hypotheses, evaded invalid model development, and ensured data integrity and parameter validity. Multiple linear regression of principal components demonstrated that the performance of biogas production using food waste was influenced by the variances of the nitrogen and organic concentrations, but not by the chemical oxygen demand to total nitrogen (C/N) ratio. In the validation process, the model developed with lab-scale reactor data showed relatively high accuracy with R2, SSE, and RMSE values of 0.86, 34.45, and 0.72.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available