☆ 4.7 Article

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

INTERNATIONAL JOURNAL OF COMPUTER VISION (2021)

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Volume 129, Issue 3, Pages 619-637

Publisher

SPRINGER

DOI: 10.1007/s11263-020-01388-x

Keywords

Scene text spotting; Arbitrary shapes; Bottom-up; Top-down; Residual dual scale

Funding

Major Project for New Generation AI [2018AAA0100400]
National Natural Science Foundation of China [61733007, 61721004]
Key Research Program of Frontier Sciences of CAS [ZDBS-LY-7004]
Youth Innovation Promotion Association of CAS [2019141]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper introduces a novel text spotter that combines bottom-up and top-down processing methods for detecting and recognizing arbitrary shaped text. By extracting features and introducing a different operator, the framework achieves immunity to character-level annotations and enhances robustness against scale variance.

Existing methods for arbitrary shaped text spotting can be divided into two categories: bottom-up methods detect and recognize local areas of text, and then group them into text lines or words; top-down methods detect text regions of interest, then apply polygon fitting and text recognition to the detected regions. In this paper, we analyze the advantages and disadvantages of these two methods, and propose a novel text spotter by fusing bottom-up and top-down processing. To detect text of arbitrary shapes, we employ a bottom-up detector to describe text with a series of rotated squares, and design a top-down detector to represent the region of interest with a minimum enclosing rotated rectangle. Then the text boundary is determined by fusing the outputs of two detectors. To connect arbitrary shaped text detection and recognition, we propose a differentiable operator named RoISlide, which can extract features for arbitrary text regions from whole image feature maps. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from character-level annotations. To improve the robustness against scale variance, we further propose a residual dual scale spotting mechanism, where two spotters work on different feature levels, and the high-level spotter is based on residuals of the low-level spotter. Our method has achieved state-of-the-art performance on four English datasets and one Chinese dataset, including both arbitrary shaped and oriented texts. We also provide abundant ablation experiments to analyze how the key components affect the performance.

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper