☆ 3.8 Proceedings Paper

A Multilingual Approach to Scene Text Visual Question Answering

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

Related references

Note: Only part of the references are listed.

Proceedings Paper Computer Science, Artificial Intelligence

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

Zhengyuan Yang et al.

Summary: The paper proposes a Text-Aware Pre-training (TAP) method for Text-VQA and Text-Caption tasks, which incorporates scene text during pretraining to improve aligned representation learning among text word, visual object, and scene text modalities. Pre-trained on a large-scale OCR-CC dataset, the approach outperforms the state of the art by large margins on multiple tasks.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence