4.8 Article

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2937086

Keywords

Scene text spotting; scene text detection; scene text recognition; arbitrary shapes; attention; segmentation

Funding

  1. National Key R&D Program of China [2018YFB1004600]
  2. National Program for Support of Topnotch Young Professionals
  3. NSFC [61733007]
  4. Program for HUST Academic Frontier Youth Team [2017QYTD08]

Ask authors/readers for more resources

The paper introduces an end-to-end trainable neural network named Mask TextSpotter for scene text spotting, which combines text detection and recognition. By utilizing two-dimensional space via semantic segmentation, it simplifies the learning procedure and is able to handle text instances of irregular shapes effectively.
Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available