☆ 4.6 Article

SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

IEEE ACCESS (2019)

Journal

IEEE ACCESS

Volume 7, Issue -, Pages 370-382

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2018.2885398

Keywords

Offline handwritten Chinese text recognition (HCTR); educational documents; sequence transcription

Funding

National Key Research and Development Program of China [2016YFB1001405, GD-NSF 2017A030312006]
NSFC [61673182, 61771199]
GDSTP [2017A010101027, GZSTP 201607010227]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Most existing studies and public datasets for handwritten Chinese text recognition are based on the regular documents with clean and blank background, lacking research reports for handwritten text recognition on challenging areas such as educational documents and financial bills. In this paper, we focus on examination paper text recognition and construct a challenging dataset named examination paper text (SCUT-EPT) dataset, which contains 50 000 text line images (40 000 for training and 10 000 for testing) selected from the examination papers of 2 986 volunteers. The proposed SCUT-EPT dataset presents numerous novel challenges, including character erasure, text line supplement, character/phrase switching, noised background, nonuniform word size, and unbalanced text length. In our experiments, the current advanced text recognition methods, such as convolutional recurrent neural network (CRNN) exhibits poor performance on the proposed SCUT-EPT dataset, proving the challenge and significance of the dataset. Nevertheless, through visualizing and error analysis, we observe that humans can avoid vast majority of the error predictions, which reveal the limitations and drawbacks of the current methods for handwritten Chinese text recognition (HCTR). Finally, three popular sequence transcription methods, connectionist temporal classification (CTC), attention mechanism, and cascaded attention-CTC are investigated for HCTR problem. It is interesting to observe that although the attention mechanism has been proved to be very effective in English scene text recognition, its performance is far inferior to the CTC method in the case of HCTR with large-scale character set.

SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SCUT-EPT: New Dataset and Benchmark for Offline Chinese Text Recognition in Examination Paper

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper