3.8 Proceedings Paper

ICDAR 2021 Competition on Document Visual Question Answering

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Proceedings Paper Computer Science, Artificial Intelligence

Infographic VQA

Minesh Mathew et al.

Summary: This work explores the automatic understanding of infographic images using a Visual Question Answering technique, and presents a diverse dataset called InfographicVQA. The dataset requires methods to reason over document layout, textual content, graphical elements, and data visualizations. Two Transformer-based baselines are evaluated, but they do not perform as well as humans on the dataset. The study suggests that VQA on infographics can serve as a benchmark for evaluating machine understanding of complex document images.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Proceedings Paper Computer Science, Information Systems

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Rafal Powalski et al.

Summary: This research introduces a neural network architecture called TILT, which can simultaneously learn layout information, visual features, and textual semantics to enhance natural language comprehension. Unlike previous approaches, the model relies on a decoder that can unify various natural language problems. By using a pretrained encoder-decoder Transformer as the core, the method achieves state-of-the-art results in document information extraction and layout understanding.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

Proceedings Paper Computer Science, Information Systems

Document Collection Visual Question Answering

Ruben Tito et al.

Summary: Current methods in Document Understanding focus on processing individual documents, while documents are typically organized in collections which provide valuable context for interpretation. To address this issue, DocCVQA introduces a new dataset and task where questions are posed over a whole collection of document images, aiming to provide answers to questions and retrieve the documents containing relevant information. Along with the dataset, a new evaluation metric and baselines are proposed to gain further insights into this new dataset and task.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

Proceedings Paper Computer Science, Artificial Intelligence

DocVQA: A Dataset for VQA on Document Images

Minesh Mathew et al.

Summary: DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Scene Text Visual Question Answering

Ali Furkan Biten et al.

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

DVQA: Understanding Data Visualizations via Question Answering

Kushal Kafle et al.

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2018)

Article Computer Science, Artificial Intelligence

VQA: Visual Question Answering

Aishwarya Agrawal et al.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Are You Smarter Than A Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Aniruddha Kembhavi et al.

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) (2017)

Proceedings Paper Computer Science, Interdisciplinary Applications

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi et al.

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1 (2017)