4.7 Article

DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIM.2022.3178991

Keywords

Transformers; Image segmentation; Semantics; Decoding; Computer architecture; Task analysis; Medical diagnostic imaging; Hierarchical swin transformer; long-range contextual information; medical image segmentation; transformer interactive fusion~(TIF) module

Funding

  1. NSFC Fund [62176077, 61906162]
  2. Guangdong Basic and Applied Basic Research Foundation [2019Bl515120055]
  3. Shenzhen Key Technical Project [2020N046]
  4. Shenzhen Fundamental Research Fund [JCYJ20210324132210025, JCYJ20210324132212030]
  5. Medical Biometrics Perception and Analysis Engineering Laboratory, Shenzhen, China
  6. Shenzhen Science and Technology Program [RCBS20200714114910193]
  7. Education Center of Experiments and Innovations at Harbin Institute of Technology, Shenzhen

Ask authors/readers for more resources

Automatic medical image segmentation has greatly benefited from powerful deep representation learning. This article proposes a novel framework called DS-TransUNet, which incorporates hierarchical swin transformer into the encoder and decoder, enhancing semantic segmentation quality through self-attention computation and dual-scale encoding. The extensive experiments demonstrate the effectiveness of DS-TransUNet and its superiority over state-of-the-art methods in medical image segmentation tasks.
Automatic medical image segmentation has made great progress owing to powerful deep representation learning. Inspired by the success of self-attention mechanism in transformer, considerable efforts are devoted to designing the robust variants of the encoder-decoder architecture with transformer. However, the patch division used in the existing transformer-based models usually ignores the pixel-level intrinsic structural features inside each patch. In this article, we propose a novel deep medical image segmentation framework called dual swin transformer U-Net (DS-TransUNet), which aims to incorporate the hierarchical swin transformer into both the encoder and the decoder of the standard U-shaped architecture. Our DS-TransUNet benefits from the self-attention computation in swin transformer and the designed dual-scale encoding, which can effectively model the non-local dependencies and multiscale contexts for enhancing the semantic segmentation quality of varying medical images. Unlike many prior transformer-based solutions, the proposed DS-TransUNet adopts a well-established dual-scale encoding mechanism that uses dual-scale encoders based on swin transformer to extract the coarse and fine-grained feature representations of different semantic scales. Meanwhile, a well-designed transformer interactive fusion (TIF) module is proposed to effectively perform multiscale information fusion through the self-attention mechanism. Furthermore, we introduce the swin transformer block into the decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and our approach significantly outperforms the state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available