☆ 3.8 Proceedings Paper

Infographic VQA

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

期刊

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)

卷 -, 期 -, 页码 2582-2591

出版社

IEEE COMPUTER SOC

DOI: 10.1109/WACV51458.2022.00264

关键词

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Imaging Science & Photographic Technology

资金

MeitY, Government of India
CERCA Programme/Generalitat de Catalunya [PID2020-116298GB-I0]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This work explores the automatic understanding of infographic images using a Visual Question Answering technique, and presents a diverse dataset called InfographicVQA. The dataset requires methods to reason over document layout, textual content, graphical elements, and data visualizations. Two Transformer-based baselines are evaluated, but they do not perform as well as humans on the dataset. The study suggests that VQA on infographics can serve as a benchmark for evaluating machine understanding of complex document images.

Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics-images that are designed to communicate information quickly and clearly to human brain-is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa.org

Infographic VQA

期刊

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Infographic VQA

期刊

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文