4.7 Article

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

期刊

INTERNATIONAL JOURNAL OF COMPUTER VISION
卷 130, 期 11, 页码 2623-2645

出版社

SPRINGER
DOI: 10.1007/s11263-022-01654-0

关键词

Handwritten Chinese text recognition; Page-level handwritten text recognition; Weakly supervised learning; Reading order

资金

  1. NSFC [61936003]
  2. GD-NSF [2017A030312006, 2021A1515 011870]
  3. Science and Technology Foundation of Guangzhou Huangpu Development District [2020GH17]

向作者/读者索取更多资源

Handwritten Chinese text recognition (HCTR) is an active research area, but most previous studies only focus on recognition of cropped text line images and ignore the errors caused by text line detection in real-world applications. This study proposes PageNet, an end-to-end weakly supervised page-level HCTR model that detects and recognizes characters and predicts reading order between them. PageNet is able to handle complex layouts, including multi-directional and curved text lines, and requires only transcripts for real data annotation, avoiding the cost of labeling bounding boxes. Experimental results on five datasets show PageNet's superiority over existing weakly supervised and fully supervised page-level methods.
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines. Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines. Extensive experiments conducted on five datasets demonstrate the superiority of PageNet over existing weakly supervised and fully supervised page-level methods. These experimental results may spark further research beyond the realms of existing methods based on connectionist temporal classification or attention. The source code is available at https://github.com/shannanyinxiang/PageNet.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据