4.7 Article

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

期刊

APPLIED SOFT COMPUTING
卷 111, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.asoc.2021.107692

关键词

Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning

资金

  1. European Regional Development Fund (ERDF) [TIN2016-75097-P, RTI2018-094645-B-I00, UMA18-FEDERJA-084]
  2. Universidad de Malaga, Spain

向作者/读者索取更多资源

The identification of virus carriers is crucial in the fight against viral diseases such as COVID-19. Deep learning for image classification of chest X-ray images can be a useful pre-diagnostic detection methodology, but it requires large labelled datasets which can be limited in new research areas. Our proposed method for correcting data imbalance improved classification accuracy by up to 18%.
A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据