☆ 3.8 Proceedings Paper

A Multilingual Approach to Scene Text Visual Question Answering

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

期刊

DOCUMENT ANALYSIS SYSTEMS, DAS 2022

卷 13237, 期 -, 页码 65-79

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

DOI: 10.1007/978-3-031-06555-2_5

关键词

Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Imaging Science & Photographic Technology

资金

MCIN/AEI [PDC2021-121512-I00, PID2020-116298GB-I00, PLEC2021-007850]
European Union Next Generation EU/PRTR

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Scene Text Visual Question Answering (ST-VQA) is a hot research topic in Computer Vision. Current models have limited performance on multiple languages. This study explores the possibility of obtaining bilingual and multilingual VQA models and demonstrates the performance improvement by using multilingual word embeddings during training.

Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.

A Multilingual Approach to Scene Text Visual Question Answering

期刊

DOCUMENT ANALYSIS SYSTEMS, DAS 2022

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Multilingual Approach to Scene Text Visual Question Answering

期刊

DOCUMENT ANALYSIS SYSTEMS, DAS 2022

出版社

SPRINGER INTERNATIONAL PUBLISHING AG

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文