期刊
MEDICAL IMAGING 2022: IMAGE PROCESSING
卷 12032, 期 -, 页码 -出版社
SPIE-INT SOC OPTICAL ENGINEERING
DOI: 10.1117/12.2611469
关键词
Optical coherence tomography; automatic report generation; visual and semantic attention mechanism
类别
资金
- National Key Research and Development Program of China [2018YFA0701700]
- National Nature Science Foundation of China [U20A20170]
In this paper, a deep learning-based VSTA model is proposed for report generation from OCT images. By embedding semantic attention and visual attention into the model, as well as initializing attention weights with semantic tags based on image similarity, the prediction accuracy of the model is improved.
Optical coherence tomography (OCT) is widely used in the diagnosis of retinal diseases. Reading OCT images and summarizing its insights is a routine, yet nonetheless time-consuming task. Automatic report generation can alleviate this issue. There are two major challenges in this task: (1) An OCT image may contain several fundus abnormalities and it is difficult to detect them all simultaneously. (2) The diagnostic reports are complex, which need to describe multiple lesions. In this paper, we propose a deep learning-based model, named as VSTA model (Visual and Semantic Topic Attention model), which is able to generate report from the input OCT image. Our major contributions include: (1) Semantic attention and visual attention are jointly embedded to the model to generate diagnosis report with complex content. (2) Semantic tags based on image similarity is employed to initialize the semantic attention weights, which increases the prediction accuracy of the model. With the proposed VSTA model, the metric of BLEU-4, CIDEr and ROUGE-L reach 31.16, 264.22 and 52.58, which are better than some existing advanced methods.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据