3.8 Proceedings Paper

MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION

出版社

IEEE
DOI: 10.1109/ICASSP43922.2022.9746172

关键词

Medical image segmentation; Deep learning; Vision Transformer; Self-attention

资金

  1. Natural Science Foundation of Zhejiang Province [LZ22F020012]
  2. Major Scientific Research Project of Zhejiang Lab [2020ND8AD01]
  3. Japanese Ministry for Education, Science, Culture and Sports (MEXT) [20KK0234, 21H03470, 20K21821]

向作者/读者索取更多资源

Although U-Net has been successful in medical image segmentation, it lacks the ability to model long-range dependencies. Vision Transformers have emerged as an alternative structure for their ability to capture long-range correlations through Self-Attention (SA), but they require large-scale pre-training and have high computational complexity. To address these issues, we propose a Mixed Transformer Module (MTM) that learns inter- and intra-affinities simultaneously. MTM efficiently calculates self-affinities using our Local-Global Gaussian-Weighted Self-Attention (LGG-SA) and mines inter-connections between data samples using External Attention (EA). We construct a U-shaped model called Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation using MTM. Experimental results on two public datasets demonstrate that our proposed method outperforms other state-of-the-art methods.
Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter-and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据