4.6 Article

Multi-criteria text mining model for COVID-19 testing reasons and symptoms and temporal predictive model for COVID-19 test results in rural communities

期刊

NEURAL COMPUTING & APPLICATIONS
卷 34, 期 10, 页码 7523-7536

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s00521-021-06884-w

关键词

Classification; Community health; Machine learning; Population health analytics; Primary care; Text mining

向作者/读者索取更多资源

This study aimed to build a multi-criteria text mining model for COVID-19 testing reasons and symptoms. The model integrates with a temporal predictive classification model for COVID-19 test results in underserved rural areas. By using look-up wordlists and a multi-criteria mapping process, the text mining model classifies the notes related to testing reasons and reported symptoms into one or more categories.
This study is conducted to build a multi-criteria text mining model for COVID-19 testing reasons and symptoms. The model is integrated with a temporal predictive classification model for COVID-19 test results in rural underserved areas. A dataset of 6895 testing appointments and 14 features is used in this study. The text mining model classifies the notes related to the testing reasons and reported symptoms into one or more categories using look-up wordlists and a multi-criteria mapping process. The model converts an unstructured feature to a categorical feature that is used in building the temporal predictive classification model for COVID-19 test results and conducting some population analytics. The classification model is a temporal model (ordered and indexed by testing date) that uses machine learning classifiers to predict test results that are either positive or negative. Two types of classifiers and performance measures that include balanced and regular methods are used: (1) balanced random forest and (2) balanced bagged decision tree. The balanced or weighted methods are used to address and account for the biased and imbalanced dataset and to ensure correct detection of patients with COVID-19 (minority class). The model is tested in two stages using validation and testing sets to ensure robustness and reliability. The balanced classifiers outperformed regular classifiers using the balanced performance measures (balanced accuracy and G-score), which means the balanced classifiers are better at detecting patients with positive COVID-19 results. The balanced random forest achieved the best average balanced accuracy (86.1%) and G-score (86.1%) using the validation set. The balanced bagged decision tree achieved the best average balanced accuracy (83.0%) and G-score (82.8%) using the testing set. Also, it was found that the patient history, age, testing reasons, and time are the key features to classify the testing results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据