4.7 Article

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION
Volume 130, Issue 11, Pages 2623-2645

Publisher

SPRINGER
DOI: 10.1007/s11263-022-01654-0

Keywords

Handwritten Chinese text recognition; Page-level handwritten text recognition; Weakly supervised learning; Reading order

Funding

  1. NSFC [61936003]
  2. GD-NSF [2017A030312006, 2021A1515 011870]
  3. Science and Technology Foundation of Guangzhou Huangpu Development District [2020GH17]

Ask authors/readers for more resources

Handwritten Chinese text recognition (HCTR) is an active research area, but most previous studies only focus on recognition of cropped text line images and ignore the errors caused by text line detection in real-world applications. This study proposes PageNet, an end-to-end weakly supervised page-level HCTR model that detects and recognizes characters and predicts reading order between them. PageNet is able to handle complex layouts, including multi-directional and curved text lines, and requires only transcripts for real data annotation, avoiding the cost of labeling bounding boxes. Experimental results on five datasets show PageNet's superiority over existing weakly supervised and fully supervised page-level methods.
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines. Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines. Extensive experiments conducted on five datasets demonstrate the superiority of PageNet over existing weakly supervised and fully supervised page-level methods. These experimental results may spark further research beyond the realms of existing methods based on connectionist temporal classification or attention. The source code is available at https://github.com/shannanyinxiang/PageNet.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available