4.8 Article

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2937086

关键词

Scene text spotting; scene text detection; scene text recognition; arbitrary shapes; attention; segmentation

资金

  1. National Key R&D Program of China [2018YFB1004600]
  2. National Program for Support of Topnotch Young Professionals
  3. NSFC [61733007]
  4. Program for HUST Academic Frontier Youth Team [2017QYTD08]

向作者/读者索取更多资源

The paper introduces an end-to-end trainable neural network named Mask TextSpotter for scene text spotting, which combines text detection and recognition. By utilizing two-dimensional space via semantic segmentation, it simplifies the learning procedure and is able to handle text instances of irregular shapes effectively.
Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据