☆ 4.7 Article

Visual prior-based cross-modal alignment network for radiology report generation

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

卷 166, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compbiomed.2023.107522

关键词

Radiology report generation; Visual prior; Contrastive attention; Cross-modal alignment; Multi-head attention

类别

Biology Computer Science, Interdisciplinary Applications Engineering, Biomedical Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Automated radiology report generation is a popular method for alleviating the workload of radiologists and reducing misdiagnosis. However, existing approaches lack visual prior and alignment between images and texts. To address these issues, this study proposes a Visual Prior-based Cross-modal Alignment Network, which uses contrastive attention to extract visual prior and a cross-modal alignment network to align images and texts. Experimental results on benchmark datasets demonstrate that the proposed model outperforms state-of-the-art models in terms of various metrics.

Automated radiology report generation is gaining popularity as a means to alleviate the workload of radiologists and prevent misdiagnosis and missed diagnoses. By imitating the working patterns of radiologists, previous report generation approaches have achieved remarkable performance. However, these approaches suffer from two significant problems: (1) lack of visual prior: medical observations in radiology images are interdependent and exhibit certain patterns, and lack of such visual prior can result in reduced accuracy in identifying abnormal regions; (2) lack of alignment between images and texts: the absence of annotations and alignments for regions of interest in the radiology images and reports can lead to inconsistent visual and textual features of the abnormal regions generated by the model. To address these issues, we propose a Visual Prior -based Cross-modal Alignment Network for radiology report generation. First, we propose a novel Contrastive Attention that compares input image with normal images to extract difference information, namely visual prior, which helps to identify abnormalities quickly. Then, to facilitate the alignment of images and texts, we propose a Cross-modal Alignment Network that leverages the cross-modal matrix initialized by the features generated by pre-trained models, to compute cross-modal responses for visual and textual features. Finally, a Visual Prior-guided Multi-Head Attention is proposed to incorporate the visual prior into the generation process. The extensive experimental results on two benchmark datasets, IU-Xray and MIMIC-CXR, illustrate that our proposed model outperforms the state-of-the-art models over almost all metrics, achieving BLEU-4 scores of 0.188 and 0.116 and CIDEr scores of 0.409 and 0.240, respectively.

Visual prior-based cross-modal alignment network for radiology report generation

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Visual prior-based cross-modal alignment network for radiology report generation

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文