☆ 4.7 Article

Linguistically-aware attention for reducing the semantic gap in vision-language tasks

PATTERN RECOGNITION (2021)

Journal

PATTERN RECOGNITION

Volume 112, Issue -, Pages -

Publisher

ELSEVIER SCI LTD

DOI: 10.1016/j.patcog.2020.107812

Keywords

Attention models; Visual question answering; Counting in visual question answering; Image captioning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper proposes a Linguistically-aware Attention (LAT) mechanism to bridge the semantic gap between visual and textual modalities in Vision-language tasks. LAT leverages object attributes and pre-trained language models to provide linguistic awareness to the attention process, and shows improved performance in various V-L tasks.

Attention models are widely used in Vision-language (V-L) tasks to perform the visual-textual correlation. Humans perform such a correlation with a strong linguistic understanding of the visual world. However, even the best performing attention model in V-L tasks lacks such a high-level linguistic understanding, thus creating a semantic gap between the modalities. In this paper, we propose an attention mechanism Linguistically-aware Attention (LAT) that leverages object attributes obtained from generic object detectors along with pre-trained language models to reduce this semantic gap. LAT represents visual and textual modalities in a common linguistically-rich space, thus providing linguistic awareness to the attention process. We apply and demonstrate the effectiveness of LAT in three V-L tasks: Counting-VQA, VQA, and Image captioning. In Counting-VQA, we propose a novel counting-specific VQA model to predict an intuitive count and achieve state-of-the-art results on five datasets. In VQA and Captioning, we show the generic nature and effectiveness of LAT by adapting it into various baselines and consistently improving their performance. (c) 2021 Elsevier Ltd. All rights reserved.

Linguistically-aware attention for reducing the semantic gap in vision-language tasks

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Linguistically-aware attention for reducing the semantic gap in vision-language tasks

Journal

PATTERN RECOGNITION

Publisher

ELSEVIER SCI LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper