Journal
IEEE ACCESS
Volume 6, Issue -, Pages 10603-10617Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2018.2795104
Keywords
Character recognition; character segmentation; document analysis; graph theory; multilingual Indian optical character recognition
Ask authors/readers for more resources
In this paper, robust algorithms for character segmentation and recognition are presented for multilingual Indian document images of Latin and Devanagari scripts. These documents generally suffer from their layout organizations, local skews, and low print quality and contain intermixed texts (machine-printed and handwritten). In the proposed character segmentation algorithm, primary segmentation paths are obtained using structural property of characters, whereas overlapped and joined characters are separated using graph distance theory. Finally, segmentation results are validated using highly accurate support vector machine classifier. For the proposed character recognition algorithm, three new geometrical shape-based features are computed. First and second features are formed with respect to the center pixel of character, whereas neighborhood information of text pixels is used for the calculation of third feature. For recognizing the input character, k-Nearest Neighbor classifier is used, as it has intrinsically zero training time. Comprehensive experiments are carried out on different databases containing printed as well as handwritten texts. Benchmarking results illustrate that proposed algorithms have better performances compared to other contemporary approaches, where highest segmentation and recognition rates of 98.86% and 99.84%, respectively, are obtained.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available