4.6 Article

Dense and Tight Detection of Chinese Characters in Historical Documents: Datasets and a Recognition Guided Detector

Journal

IEEE ACCESS
Volume 6, Issue -, Pages 30174-30183

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2018.2840218

Keywords

Historical documents; character detection; recognition guided detector; data sets

Funding

  1. National Key Research and Development Program of China [0 2016YFB1001405]
  2. NSFC [61472144, 61673182, 61771199]
  3. GD-NSF Grant [2017A030312006]
  4. GDSTP Grant [2017A030312006, 2015B010101004]
  5. GZSTP Grant [201607010227]

Ask authors/readers for more resources

Characters in historical documents are typically densely distributed and are difficult to localize and segment by directly applying classic proposal and regression based methods. In this paper, we propose a novel method called recognition guided detector (RGD) that achieves tight Chinese character detection in historical documents. The proposed RGD consists of two simultaneously trained convolutional neural networks: a recognition guided proposal network that provides context information of the text and a detection network that applies this information to localize each of the characters accurately. To train and test the proposed method, we established two new datasets with character-level annotations, comprising ground truth character bounding boxes and ground truth characters in each of the boxes. The data in our datasets are scanned images collected from nine different versions of Tripitaka in Han. Experimental results show that, guided by a text recognition network with a test accuracy of 97.25%, the detection network in our proposed method achieves a much higher F-score with fewer parameters under a highly constrained evaluation criterion of intersection of union (IoU) >=, 0.7, when comparing to several state-of-the-art object detection and text detection methods. The datasets are publicly available at https://github.com/HCIILAB/TKH_MTH_Datasets_Release for non-commercial use.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available