4.7 Article

A novel framework for automatic sorting of postal documents with multi-script address blocks

期刊

PATTERN RECOGNITION
卷 43, 期 10, 页码 3507-3521

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2010.05.018

关键词

Automatic mail sorting; Multi-script postal address block; Script identification for numerals; Quad-tree based feature extraction; Handwritten numeral recognition; Support vector machine

资金

  1. A.I.C.T.E. (New Delhi, India) [1-51/RID/EF(13)/2007-08]

向作者/读者索取更多资源

Recognition of numeric postal codes in a multi-script environment is a classical problem in any postal automation system. In such postal documents, determination of the script of the handwritten postal codes is crucial for subsequent invocation of the digit recognizers for respective scripts. The current framework attempts to infer about the script of the numeric postal code without having any bias from the script of the textual address part of the rest of the address block, as they might differ in a potential multi-script environment. Scope of the current work is to recognize the postal codes written in any of the four popular scripts, viz., Latin, Devanagari, Bangla and Urdu. For this purpose, we first implement a Hough transformation based technique to localize the postal-code blocks from structured postal documents with defined address block region. Isolated handwritten digit patterns are then extracted from the localized postal-code region. In the next stage of the developed framework, similar shaped digit patterns of the said four scripts are grouped in 25 clusters. A script independent unified pattern classifier is then designed to classify the numeric postal codes into one of these 25 clusters. Based on these classification decisions a rule-based script inference engine is designed to infer about the script of the numeric postal code. One of the four script specific classifiers is subsequently invoked to recognize the digit patterns of the corresponding script. A novel quad-tree based image partitioning technique is also developed in this work for effective feature extraction from the numeric digit patterns. The average recognition accuracy over ten-fold cross validation of results for the support vector machine (SVM) based 25-class unified pattern classifier is obtained as 92.03%. With randomly selected six-digit numeric strings of four different scripts: an average of 96.72% script inference accuracy is achieved. The average of tenfold cross-validation recognition accuracies of the individual SVM classifiers for the Latin, Devanagari, Bangla and Urdu numerals are observed as 95.55%, 95.63%, 97.15% and 96.20%, respectively. (C) 2010 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据