4.3 Article

High-Efficiency Machine Learning Method for Identifying Foodborne Disease Outbreaks and Confounding Factors

期刊

FOODBORNE PATHOGENS AND DISEASE
卷 18, 期 8, 页码 590-598

出版社

MARY ANN LIEBERT, INC
DOI: 10.1089/fpd.2020.2913

关键词

foodborne disease outbreaks; machine learning; foodborne disease

资金

  1. National Key Research and Development Plan [2017YFC1601504]
  2. Natural Science Foundation of China [61836013]

向作者/读者索取更多资源

The study aimed to use machine learning models to monitor and identify foodborne disease outbreaks, with the eXtreme Gradient Boosting (XGBoost) model achieving the best performance based on recall rate and F1-score as evaluation metrics.
The China National Center for Food Safety Risk Assessment (CFSA) uses the Foodborne Disease Monitoring and Reporting System (FDMRS) to monitor outbreaks of foodborne diseases across the country. However, there are problems of underreporting or erroneous reporting in FDMRS, which significantly increase the cost of related epidemic investigations. To solve this problem, we designed a model to identify suspected outbreaks from the data generated by the FDMRS of CFSA. In this study, machine learning models were used to fit the data. The recall rate and F1-score were used as evaluation metrics to compare the classification performance of each model. Feature importance and pathogenic factors were identified and analyzed using tree-based and gradient boosting models. Three real foodborne disease outbreaks were then used to evaluate the best performing model. Furthermore, the SHapley Additive exPlanation value was used to identify the effect of features. Among all machine learning classification models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance, with the highest recall rate and F1-score of 0.9699 and 0.9582, respectively. In terms of model validation, the model provides a correct judgment of real outbreaks. In the feature importance analysis with the XGBoost model, the health status of the other people with the same exposure has the highest weight, reaching 0.65. The machine learning model built in this study exhibits high accuracy in recognizing foodborne disease outbreaks, thus reducing the manual burden for medical staff. The model helped us identify the confounding factors of foodborne disease outbreaks. Attention should be paid not only to the health status of those with the same exposure but also to the similarity of the cases in time and space.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据