Related references
Note: Only part of the references are listed.TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang et al.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 (2021)
Scene Text Visual Question Answering
Ali Furkan Biten et al.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) (2019)
Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach
Pratik Jawanpuria et al.
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (2019)
VQA: Visual Question Answering
Stanislaw Antol et al.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2015)