4.7 Article

Predicting the performance of medium-chain carboxylic acid (MCCA) production using machine learning algorithms and microbial community data

期刊

JOURNAL OF CLEANER PRODUCTION
卷 377, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.jclepro.2022.134223

关键词

Medium -chain carboxylic acid; Carboxylate platform; Machine learning; Feature importance; Microbial community; Performance prediction

向作者/读者索取更多资源

This study used machine learning algorithms to predict the concentration and production of medium-chain carboxylic acids (MCCA). The results showed that the prediction accuracy was higher than 0.7 when using operational parameters only, and significantly improved when incorporating genomic data. The random forest model achieved the highest prediction accuracy when using operational parameters, genomic data, and the combined dataset. Hydraulic retention time and organic loading rate were identified as the main operational parameters affecting MCCA concentration and rate. Bacteroidales and Coriobacter-iales were found to be potential universal biomarkers for process monitoring.
The carboxylate platform-based bioprocess for medium-chain carboxylic acid (MCCA) production from waste biomass via mixed culture has been the subject of extensive research because of the high economic value of MCCA and potential environmental benefits. However, modeling the conversion process using mechanistic models is challenging due to the complex and unclear interactions and metabolic pathways of the system. Herein, four data-driven machine learning (ML) algorithms, including random forest (RF), extreme gradient boosting (Xgboost), k-nearest neighbor (KNN), and artificial neural network (ANN), were employed to predict the MCCA concentration and production rate based on data (environmental and operational parameters and corresponding genomic data) collected under 94 experiment conditions from 8 research groups. It was found that all selected ML algorithms achieved prediction accuracy higher than 0.7 using operational parameters only. A significant improvement in the predictive efficacy (ranging from 0.83 to 0.87) was observed when incorporating the genomic data with environmental and operational parameters. The prediction of MCCA production by the random forest (RF) model had the highest prediction accuracy of 0.83, 0.87, and 0.89 when the operational parameters, genomic data, and combined dataset were used as input parameters, respectively. Hydraulic retention time (HRT) and organic loading rate (OLR) were identified as the dominant operational parameters that affect the MCCA concentration and rate based on the feature importance generated by RF. The key microbes that affected the MCCA concentration and MCCA production rate were different. Bacteroidales and Coriobacter-iales were the only orders sensitive to both the MCCA concentration and rate, with feature importance weights of 6.71% and 6.97%, respectively, and could be potential universal biomarkers for process monitoring. The results demonstrated that the proposed ML models could be used as a means of simulating the carboxylate platform for MCCA production from waste feedstock, enhancing the understanding of the behavior of microorganisms in the process, and providing guidance for further optimization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据