4.6 Article

An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 33, Issue 1, Pages 449-468

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-020-04981-w

Keywords

Character segmentation; Handwritten word; Bangla script; Image database; Word recognition

Funding

  1. PURSE-II, Jadavpur University
  2. UPE-II, Jadavpur University
  3. DST, Govt. of India [EMR/2016/007213]

Ask authors/readers for more resources

Recognition of unconstrained handwritten word images is a challenging research problem, especially when lexicon-free words are considered. The development of a comprehensive word recognition module requires a competent character segmentation technique. However, the lack of standard word image databases with ground truth information results in most character segmentation algorithms relying on self-made databases with manual evaluation. A comprehensive database of handwritten Bangla word images has been prepared in this study to evaluate character segmentation algorithms, along with two types of ground truth images related to segmented character shapes. The benchmark result shows that the developed database outperforms some state-of-the-art methods with an F-score of 0.9212.
Recognition of unconstrained handwritten word images is an interesting research problem which gets more challenging when lexicon-free words are considered. Prerequisite for developing a lexicon-free handwritten word recognition technique is the segmentation of a word image into its constituent character set. Therefore, a competent character segmentation technique is required to design a comprehensive word recognition module. However, the literature study reveals that there is no standard word image database with ground truth information. As a result, most character segmentation algorithms found in the literature rely on self-made databases with manual evaluation. To fill the research need, in the present scope of the work, a comprehensive database consisting of handwritten Bangla word images is prepared primarily for evaluating any character segmentation algorithms. Additionally, the present work also provides two types of ground truth images related to segmented character shapes of the word images. Besides, an evaluation tool is developed for assessing the performance of any character segmentation algorithm on the developed benchmark database. The benchmark result, as found here, is 0.9212 (F-score) which outperforms some state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available