Journal
EXPERT SYSTEMS
Volume 36, Issue 6, Pages -Publisher
WILEY
DOI: 10.1111/exsy.12459
Keywords
feature selection; handwritten text; Indic scripts; K-means clustering; modified log-Gabor transform; script classification; symmetrical uncertainty
Funding
- Centre for Microprocessor Application for Training Education and Research (CMATER)
- Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University
Ask authors/readers for more resources
In India, which has numerous officially recognized scripts, there is a primary need for categorizing the documents on the basis of the scripts used therein. Identification of script used in a document is essential for its effective handling both manually and digitally. Identification of script in a document image is an important research problem in the pattern recognition field, which, at times, suffers from the issue of growing dimensionality of the feature vector and requires an efficient feature selection technique. Keeping this fact in mind, in this paper, we propose a clustering-based filter feature selection framework in order to extract an optimal and effective feature subset from the original feature vector. The present feature selection methodology is evaluated on a script classification problem involving handwritten documents in 12 major Indic scripts. Experiments are done at word-level, text-line-level, and block-level. Experiments demonstrate that a reasonable increment in classification accuracy has been realized using comparatively lesser number of features. The proposed framework for feature selection is computationally inexpensive and can be applied to other pattern recognition problems as well.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available