4.7 Article

Machine learning model for diagnostic method prediction in parasitic disease using clinical information

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 185, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115658

关键词

Machine learning; Parasite; Diagnosis; Multi-classification; Binary-classification

向作者/读者索取更多资源

Diagnosing parasitic diseases is challenging, but a machine learning model using patient information was constructed in this study for prediction. The model successfully diagnosed parasitic diseases and predicted proper diagnosis methods using two datasets extracted from PubMed abstracts. The study demonstrated gradient boosting with synthetic minority over-sampling technique as a promising tool for disease prediction and diagnosis method selection.
Diagnosing a parasitic disease is a very difficult job in clinical practice. In this study, we constructed a machine learning model for diagnosis prediction using patient information. First, we diagnosed whether a patient has a parasitic disease. Next, we predicted the proper diagnosis method among the six types of diagnostic terms (biopsy, endoscopy, microscopy, molecular, radiology, and serology) if the patient has a parasitic disease. To make the datasets, we extracted patient information from PubMed abstracts from 1956 to 2019. We then used two datasets: the prediction for parasite-infected patient dataset (N = 8748) and the prediction for diagnosis method dataset (N = 3780). We then compared four machine learning models: support vector machine, random forest, multi-layered perceptron, and gradient boosting. To solve the data imbalance problem, the synthetic minority over-sampling technique and TomekLinks were used. In the parasite-infected patient dataset, the random forest, random forest with synthetic minority over-sampling technique, gradient boosting, gradient boosting with synthetic minority over-sampling technique, and gradient boosting with TomekLinks demonstrated the best performances (AUC: 79%). In predicting the diagnosis method dataset, gradient boosting with synthetic minority over-sampling technique was the best model (AUC: 87%). For the class prediction, gradient boosting demonstrated the best performances in biopsy (AUC: 88%). In endoscopy (AUC: 94%), molecular (AUC: 90%), and radiology (AUC: 88%), gradient boosting with synthetic minority over-sampling technique demonstrated the best performance. Random forest demonstrated the best performances in microscopy (AUC: 82%) and serology (AUC: 85%). We calculated feature importance using gradient boosting; age was the highest feature importance. In conclusion, this study demonstrated that gradient boosting with synthetic minority over-sampling technique can predict a parasitic disease and serve as a promising diagnosis tool for binary classification and multi-classification schemes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据