☆ 4.7 Article

3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

IEEE TRANSACTIONS ON MEDICAL IMAGING (2023)

期刊

IEEE TRANSACTIONS ON MEDICAL IMAGING

卷 42, 期 5, 页码 1241-1253

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMI.2022.3191974

关键词

Transformers; Three-dimensional displays; Task analysis; Solid modeling; Medical diagnostic imaging; Adaptation models; Image analysis; Dense prediction in medical volume; context-aware; window-based multi-head self-attention; local vision transformer-MLP; adaptive scaled shortcut

类别

Computer Science, Interdisciplinary Applications Engineering, Biomedical Engineering, Electrical & Electronic Imaging Science & Photographic Technology Radiology, Nuclear Medicine & Medical Imaging

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Combining vision transformer with CNN in medical volume dense prediction shows promise and challenges. This paper proposes a novel 3D Shuffle-Mixer network using a local vision transformer-MLP paradigm for improved dense prediction in medical images. Experimental results demonstrate the superiority of the proposed model compared to other state-of-the-art methods for medical dense prediction.

Dense prediction in medical volume provides enriched guidance for clinical analysis. CNN backbones have met bottleneck due to lack of long-range dependencies and global context modeling power. Recent works proposed to combine vision transformer with CNN, due to its strong global capture ability and learning capability. However, most works are limited to simply applying pure transformer with several fatal flaws (i.e., lack of inductive bias, heavy computation and little consideration for 3D data). Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging. In this paper, we propose a novel 3D Shuffle-Mixer network of a new Local Vision Transformer-MLP paradigm for medical dense prediction. In our network, a local vision transformer block is utilized to shuffle and learn spatial context from full-view slices of rearranged volume, a residual axial-MLP is designed to mix and capture remaining volume context in a slice-aware manner, and a MLP view aggregator is employed to project the learned full-view rich context to the volume feature in a view-aware manner. Moreover, an Adaptive Scaled Enhanced Shortcut is proposed for local vision transformer to enhance feature along spatial and channel dimensions adaptively, and a CrossMerge is proposed to skip-connect the multi-scale feature appropriately in the pyramid architecture. Extensive experiments demonstrate the proposed model outperforms other state-of-the-art medical dense prediction methods.

3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

期刊

IEEE TRANSACTIONS ON MEDICAL IMAGING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

期刊

IEEE TRANSACTIONS ON MEDICAL IMAGING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文