4.7 Article

MGRW-Transformer: Multigranularity Random Walk Transformer Model for Interpretable Learning

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3326283

Keywords

Graph random walk; interpretable method; multigranularity formal analysis; self-attention mechanism; vision transformer (ViT)

Ask authors/readers for more resources

In this research, we propose a multigranularity random walk transformer model guided by an attention mechanism to find the regions that influence medical image recognition tasks. Our method improves interpretability in the field.
Deep-learning models have been widely used in image recognition tasks due to their strong feature-learning ability. However, most of the current deep-learning models are black box systems that lack a semantic explanation of how they reached their conclusions. This makes it difficult to apply these methods to complex medical image recognition tasks. The vision transformer (ViT) model is the most commonly used deep-learning model with a self-attention mechanism that shows the region of influence as compared to traditional convolutional networks. Thus, ViT offers greater interpretability. However, medical images often contain lesions of variable size in different locations, which makes it difficult for a deep-learning model with a self-attention module to reach correct and explainable conclusions. We propose a multigranularity random walk transformer (MGRW-Transformer) model guided by an attention mechanism to find the regions that influence the recognition task. Our method divides the image into multiple subimage blocks and transfers them to the ViT module for classification. Simultaneously, the attention matrix output from the multiattention layer is fused with the multigranularity random walk module. Within the multigranularity random walk module, the segmented image blocks are used as nodes to construct an undirected graph using the attention node as a starting node and guiding the coarse-grained random walk. We appropriately divide the coarse blocks into finer ones to manage the computational cost and combine the results based on the importance of the discovered features. The result is that the model offers a semantic interpretation of the input image, a visualization of the interpretation, and insight into how the decision was reached. Experimental results show that our method improves classification performance with medical images while presenting an understandable interpretation for use by medical professionals.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available