期刊
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021
卷 -, 期 -, 页码 2199-2208出版社
IEEE
DOI: 10.1109/WACV48630.2021.00225
关键词
-
类别
资金
- MeitY, Government of India [TIN2017-89779-P]
- Amazon AWS Research Award
- CERCA Programme
DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa.org
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据