☆ 4.5 Article

DIGITNET: A Deep Handwritten Digit Detection and Recognition Method Using a New Historical Handwritten Digit Dataset

BIG DATA RESEARCH (2021)

期刊

BIG DATA RESEARCH

卷 23, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.bdr.2020.100182

关键词

Historical handwritten documents; Handwritten digit detection; Ensemble deep learning; Digit string recognition; DIDA handwritten digit dataset

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods

资金

Knowledge Foundation [20140032]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper presents the DIGITNET deep learning architecture and DIDA digit dataset for detecting and recognizing digits in historical handwritten documents from the nineteenth century. The dataset is generated from 100,000 Swedish historical document images and contains three sub-datasets for training the DIGITNET network, which outperforms existing methods according to experimental results.

This paper introduces a novel deep learning architecture, named DIGITNET, and a large-scale digit dataset, named DIDA, to detect and recognize handwritten digits in historical document images written in the nineteen century. To generate the DIDA dataset, digit images are collected from 100, 000 Swedish handwritten historical document images, which were written by different priests with different handwriting styles. This dataset contains three sub-datasets including single digit, large-scale bounding box annotated multi-digit, and digit string with 250, 000, 25, 000, and 200, 000 samples in RedGreen-Blue (RGB) color spaces, respectively. Moreover, DIDA is used to train the DIGITNET network, which consists of two deep learning architectures, called DIGITNET-dect and DIGITNET-rec, respectively, to isolate digits and recognize digit strings in historical handwritten documents. In DIGITNET-dect architecture, to extract features from digits, three residual units where each residual unit has three convolution neural network structures are used and then a detection strategy based on You Look Only Once (YOLO) algorithm is employed to detect handwritten digits at two different scales. In DIGITNET-rec, the detected isolated digits are passed through 3 different designed Convolutional Neural Network (CNN) architectures and then the classification results of three different CNNs are combined using a voting scheme to recognize digit strings. The proposed model is also trained with various existing handwritten digit datasets and then validated over historical handwritten digit strings. The experimental results show that the proposed architecture trained with DIDA (publicly available from: https://didadataset.io/DIDA) outperforms the state-of-the-art methods. (C) 2020 The Author(s). Published by Elsevier Inc.

DIGITNET: A Deep Handwritten Digit Detection and Recognition Method Using a New Historical Handwritten Digit Dataset

期刊

BIG DATA RESEARCH

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

DIGITNET: A Deep Handwritten Digit Detection and Recognition Method Using a New Historical Handwritten Digit Dataset

期刊

BIG DATA RESEARCH

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文