4.7 Article

SwinE-Net: hybrid deep learning approach to novel polyp segmentation using convolutional neural network and Swin Transformer

Journal

JOURNAL OF COMPUTATIONAL DESIGN AND ENGINEERING
Volume 9, Issue 2, Pages 616-632

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/jcde/qwac018

Keywords

polyp segmentation; convolutional neural networks; multidilation convolutional block; multifeature aggregation block; Swin Transformer; Vision Transformer

Funding

  1. Basic Science Research Program through the National Research Foundation of Korea (NRF) - Ministry of Education [2019R1I1A3A01059082]
  2. Korea Health Technology Research and Development Project through the Korea Health Industry Development Institute (KHIDI) - Ministry of Health and Welfare [HI19C0642]
  3. National Research Foundation of Korea [2019R1I1A3A01059082] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

This study proposes a novel deep learning model, SwinE-Net, for accurate colorectal polyp segmentation. The model effectively combines a CNN-based EfficientNet and Vision Transformer-based Swin Transformer. The proposed approach is evaluated and compared on multiple datasets, demonstrating its superior performance in polyp segmentation.
Prevention of colorectal cancer (CRC) by inspecting and removing colorectal polyps has become a global health priority because CRC is one of the most frequent cancers in the world. Although recent U-Net-based convolutional neural networks (CNNs) with deep feature representation and skip connections have shown to segment polyps effectively, U-Net-based approaches still have limitations in modeling explicit global contexts, due to the intrinsic nature locality of convolutional operations. To overcome these problems, this study proposes a novel deep learning model, SwinE-Net, for polyp segmentation that effectively combines a CNN-based EfficientNet and Vision Transformer (ViT)-based Swin Ttransformer. The main challenge is to conduct accurate and robust medical segmentation in maintaining global semantics without sacrificing low-level features of CNNs through Swin Transformer. First, the multidilation convolutional block generates refined feature maps to enhance feature discriminability for multilevel feature maps extracted from CNN and ViT. Then, the multifeature aggregation block creates intermediate side outputs from the refined polyp features for efficient training. Finally, the attentive deconvolutional network-based decoder upsamples the refined and combined feature maps to accurately segment colorectal polyps. We compared the proposed approach with previous state-of-the-art methods by evaluating various metrics using five public datasets (Kvasir, ClinicDB, ColonDB, ETIS, and EndoScene). The comparative evaluation, in particular, proved that the proposed approach showed much better performance in the unseen dataset, which shows the generalization and scalability in conducting polyp segmentation. Furthermore, an ablation study was performed to prove the novelty and advantage of the proposed network. The proposed approach outperformed previous studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available