4.6 Article

MDA-Unet: A Multi-Scale Dilated Attention U-Net for Medical Image Segmentation

期刊

APPLIED SCIENCES-BASEL
卷 12, 期 7, 页码 -

出版社

MDPI
DOI: 10.3390/app12073676

关键词

deep learning; U-Net; medical images; segmentation; computed tomography; echocardiography

向作者/读者索取更多资源

The researchers proposed a novel multi-scale deep learning segmentation model named MDA-Unet to improve the performance of medical image segmentation. The model addressed the issues of semantic gap and multi-scale context information capture in U-Net by introducing a multi-scale spatial attention module and residual blocks. Evaluation on two different datasets showed that the model achieved significant performance gains compared to the basic U-Net model.
The advanced development of deep learning methods has recently made significant improvements in medical image segmentation. Encoder-decoder networks, such as U-Net, have addressed some of the challenges in medical image segmentation with an outstanding performance, which has promoted them to be the most dominating deep learning architecture in this domain. Despite their outstanding performance, we argue that they still lack some aspects. First, there is incompatibility in U-Net's skip connection between the encoder and decoder features due to the semantic gap between low-processed encoder features and highly processed decoder features, which adversely affects the final prediction. Second, it lacks capturing multi-scale context information and ignores the contribution of all semantic information through the segmentation process. Therefore, we propose a model named MDA-Unet, a novel multi-scale deep learning segmentation model. MDA-Unet improves upon U-Net and enhances its performance in segmenting medical images with variability in the shape and size of the region of interest. The model is integrated with a multi-scale spatial attention module, where spatial attention maps are derived from a hybrid hierarchical dilated convolution module that captures multi-scale context information. To ease the training process and reduce the gradient vanishing problem, residual blocks are deployed instead of the basic U-net blocks. Through a channel attention mechanism, the high-level decoder features are used to guide the low-level encoder features to promote the selection of meaningful context information, thus ensuring effective fusion. We evaluated our model on 2 different datasets: a lung dataset of 2628 axial CT images and an echocardiographic dataset of 2000 images, each with its own challenges. Our model has achieved a significant gain in performance with a slight increase in the number of trainable parameters in comparison with the basic U-Net model, providing a dice score of 98.3% on the lung dataset and 96.7% on the echocardiographic dataset, where the basic U-Net has achieved 94.2% on the lung dataset and 93.9% on the echocardiographic dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据