☆ 3.8 Proceedings Paper

DocVQA: A Dataset for VQA on Document Images

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

卷 -, 期 -, 页码 2199-2208

出版社

IEEE

DOI: 10.1109/WACV48630.2021.00225

关键词

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Imaging Science & Photographic Technology

资金

MeitY, Government of India [TIN2017-89779-P]
Amazon AWS Research Award
CERCA Programme

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.

We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa.org

DocVQA: A Dataset for VQA on Document Images

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

DocVQA: A Dataset for VQA on Document Images

期刊

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文