☆ 4.7 Article

Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2023)

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

卷 124, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.engappai.2023.106551

关键词

Scene text image super-resolution; Scene text recognition; Contextual information; Visual features; Frequency domain learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Engineering, Multidisciplinary Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a Perceiving Multiple Representations (PerMR) method for scene text image super-resolution is proposed. It combines super-resolution with text recognition and utilizes the feedback of the recognizer to improve super-resolution performance and enhance image quality. Experimental results demonstrate that PerMR can generate more distinguishable images and outperform state-of-the-art methods.

Single image super-resolution (SISR) aims to recover clear high-resolution images from low-resolution images, which has made great progress with the development of deep learning these years. Scene text image super -resolution (STISR) is a subfield of SISR with the goal of increasing the resolution of a low-resolution text image and enhancing the readability of characters in the image. Despite significant improvements in recent approaches, STISR remains a challenging task due to the diversity of background, text appearances and layouts, etc. This paper presents a Perceiving Multiple Representations (PerMR) method for better super -resolution performances in scene text images. PerMR is a unified network that combines super-resolution with text recognition and exploits the recognizer's feedback to facilitate super-resolution. Specifically, contextual information from the text decoder is extracted to provide sequence-specific guidance and enable the super -resolution model to pay more attention to the text region. Meanwhile, low-level and high-level visual features from the vision backbone of the recognition network are integrated to further improve visual quality. Additionally, we incorporate a frequency branch into the vanilla convolution unit, which efficiently enhances global and local feature representations. Experiments on the STISR benchmark dataset TextZoom validate that PerMR can not only generate more distinguishable images, but also outperforms the current state-of-the-art methods. PerMR boosts the average recognition accuracy by 5.9% using ASTER, 5.8% using MORAN and 10.6% using CRNN compared to the baseline model TSRN. PerMR outperforms the advanced method TPGSR-3 by 1.4% on ASTER, 0.1% on MORAN, 0.2% on CRNN and boosts TATT by 0.6% on ASTER and 1.1% on MORAN respectively. Furthermore, PerMR demonstrates good robustness and generalization when tackling low-quality text images in multiple scene text recognition datasets. The experiment results verify the capabilities of PerMR to boost text recognition performance.

Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Perceiving Multiple Representations for scene text image super-resolution guided by text recognizer

期刊

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文