4.7 Article

MuSiC-ViT: A multi-task Siamese convolutional vision transformer for differentiating change from no-change in follow-up chest radiographs

Journal

MEDICAL IMAGE ANALYSIS
Volume 89, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.media.2023.102894

Keywords

CNNs meet vision transformers; Follow-up chest radiographs; Multi-task learning; Vision transformer; Siamese network

Ask authors/readers for more resources

A major responsibility of radiologists is to read follow-up chest radiographs and differentiate meaningful changes from natural or benign variations. This study proposes the use of a multi-task Siamese convolutional vision transformer with an anatomy-matching module to mimic the radiologist's cognitive process.
A major responsibility of radiologists in routine clinical practice is to read follow-up chest radiographs (CXRs) to identify changes in a patient's condition. Diagnosing meaningful changes in follow-up CXRs is challenging because radiologists must differentiate disease changes from natural or benign variations. Here, we suggest using a multi-task Siamese convolutional vision transformer (MuSiC-ViT) with an anatomy-matching module (AMM) to mimic the radiologist's cognitive process for differentiating baseline change from no-change. MuSiC-ViT uses the convolutional neural networks (CNNs) meet vision transformers model that combines CNN and transformer architecture. It has three major components: a Siamese network architecture, an AMM, and multi-task learning. Because the input is a pair of CXRs, a Siamese network was adopted for the encoder. The AMM is an attention module that focuses on related regions in the CXR pairs. To mimic a radiologist's cognitive process, MuSiC-ViT was trained using multi-task learning, normal/abnormal and change/no-change classification, and anatomymatching. Among 406 K CXRs studied, 88 K change and 115 K no-change pairs were acquired for the training dataset. The internal validation dataset consisted of 1,620 pairs. To demonstrate the robustness of MuSiC-ViT, we verified the results with two other validation datasets. MuSiC-ViT respectively achieved accuracies and area under the receiver operating characteristic curves of 0.728 and 0.797 on the internal validation dataset, 0.614 and 0.784 on the first external validation dataset, and 0.745 and 0.858 on a second temporally separated validation dataset. All code is available at https://github.com/chokyungjin/MuSiC-ViT.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available