☆ 4.7 Article

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

APPLIED SOFT COMPUTING (2021)

期刊

APPLIED SOFT COMPUTING

卷 111, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.asoc.2021.107692

关键词

Coronavirus; COVID-19; Computer aided diagnosis; Data imbalance; Semi-supervised learning

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications

资金

European Regional Development Fund (ERDF) [TIN2016-75097-P, RTI2018-094645-B-I00, UMA18-FEDERJA-084]
Universidad de Malaga, Spain

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The identification of virus carriers is crucial in the fight against viral diseases such as COVID-19. Deep learning for image classification of chest X-ray images can be a useful pre-diagnostic detection methodology, but it requires large labelled datasets which can be limited in new research areas. Our proposed method for correcting data imbalance improved classification accuracy by up to 18%.

A key factor in the fight against viral diseases such as the coronavirus (COVID-19) is the identification of virus carriers as early and quickly as possible, in a cheap and efficient manner. The application of deep learning for image classification of chest X-ray images of COVID-19 patients could become a useful pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in such context, the datasets are also highly imbalanced, with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch with a very limited number of labelled observations and highly imbalanced labelled datasets. We demonstrate the critical impact of data imbalance to the model's accuracy. Therefore, we propose a simple approach for correcting data imbalance, by re-weighting each observation in the loss function, giving a higher weight to the observations corresponding to the under-represented class. For unlabelled observations, we use the pseudo and augmented labels calculated by MixMatch to choose the appropriate weight. The proposed method improved classification accuracy by up to 18%, with respect to the non balanced MixMatch algorithm. We tested our proposed approach with several available datasets using 10, 15 and 20 labelled observations, for binary classification (COVID-19 positive and normal cases). For multi-class classification (COVID-19 positive, pneumonia and normal cases), we tested 30, 50, 70 and 90 labelled observations. Additionally, a new dataset is included among the tested datasets, composed of chest X-ray images of Costa Rican adult patients. (C) 2021 Elsevier B.V. All rights reserved.

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

期刊

APPLIED SOFT COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images

期刊

APPLIED SOFT COMPUTING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文