3.8 Proceedings Paper

A Multilingual Approach to Scene Text Visual Question Answering

Related references

Note: Only part of the references are listed.
Proceedings Paper Computer Science, Artificial Intelligence

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

Zhengyuan Yang et al.

Summary: The paper proposes a Text-Aware Pre-training (TAP) method for Text-VQA and Text-Caption tasks, which incorporates scene text during pretraining to improve aligned representation learning among text word, visual object, and scene text modalities. Pre-trained on a large-scale OCR-CC dataset, the approach outperforms the state of the art by large margins on multiple tasks.

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Scene Text Visual Question Answering

Ali Furkan Biten et al.

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)

Article Computer Science, Artificial Intelligence

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach

Pratik Jawanpuria et al.

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2019)

Proceedings Paper Computer Science, Artificial Intelligence

VQA: Visual Question Answering

Stanislaw Antol et al.

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)