ICDAR 2021 Competition on Document Visual Question Answering

Proceedings Paper Computer Science, Artificial Intelligence

Infographic VQA

Minesh Mathew et al.

Summary: This work explores the automatic understanding of infographic images using a Visual Question Answering technique, and presents a diverse dataset called InfographicVQA. The dataset requires methods to reason over document layout, textual content, graphical elements, and data visualizations. Two Transformer-based baselines are evaluated, but they do not perform as well as humans on the dataset. The study suggests that VQA on infographics can serve as a benchmark for evaluating machine understanding of complex document images.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Rafal Powalski et al.

Summary: This research introduces a neural network architecture called TILT, which can simultaneously learn layout information, visual features, and textual semantics to enhance natural language comprehension. Unlike previous approaches, the model relies on a decoder that can unify various natural language problems. By using a pretrained encoder-decoder Transformer as the core, the method achieves state-of-the-art results in document information extraction and layout understanding.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Document Collection Visual Question Answering

Ruben Tito et al.

Summary: Current methods in Document Understanding focus on processing individual documents, while documents are typically organized in collections which provide valuable context for interpretation. To address this issue, DocCVQA introduces a new dataset and task where questions are posed over a whole collection of document images, aiming to provide answers to questions and retrieve the documents containing relevant information. Along with the dataset, a new evaluation metric and baselines are proposed to gain further insights into this new dataset and task.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

DocVQA: A Dataset for VQA on Document Images

Minesh Mathew et al.

Summary: DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence