☆ 4.6 Article

Visual relationship detection with recurrent attention and negative sampling

NEUROCOMPUTING (2021)

Journal

NEUROCOMPUTING

Volume 434, Issue -, Pages 55-66

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2020.12.099

Keywords

Computer vision; Neural networks; Visual relations

Funding

National Key R&D Program of China [2018YFB1308000]
National Natural Science Foundation of China [61772508, U1713213, 61976143]
Shenzhen Technology Project [JCYJ20170413152535587]
CAS Key Technology Talent Program
Guangdong Technology Program [2016B010108010, 2016B010125003, 2017B010110007]
Shen-zhen Engineering Laboratory for 3D Content Generating Technologies [[2017] 476]
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences [2014DP173025]
Guangdong-Hong Kong-Macao Joint Laboratory of HumanMachine Intelligence-Synergy Systems [2019B121205007]
CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper presents a fast method for visual relationship detection based on recurrent attention and negative sampling, integrating Word2Vec model and binary masks for learning non-visual features and spatial location features, and using undersampling technique to alleviate the influence of imbalanced annotations. Experiments show that the proposed method achieves state-of-the-art results on benchmark VRD and Visual Genome (VG) datasets in most cases.

Detecting relationships between objects is important for the complete understanding of visual scenes, which will be helpful for applications such as visual question answering, image search, and robotic interactions. It is however a challenging task due to the high variation of object appearance and interactions, and the often incomplete annotations. In this paper, we propose a fast method for visual relationship detection based on recurrent attention and negative sampling. First, to learn non-visual features, we use the Word2Vec model to extract semantic embedding features of object categories, and use binary masks to represent spatial location features. And we integrate the recurrent attention mechanism into the detection pipeline, enabling the network to focus on several specific parts of an image when scoring predicates for a given object pair. Then we use an undersampling technique to alleviate the influence of imbalanced annotations, particularly for zero-shot detection. The proposed method is simple but experiments prove that it is efficient and achieves state-of-the-art results on the benchmark VRD and Visual Genome (VG) datasets in most cases. (c) 2021 Elsevier B.V. All rights reserved.

Visual relationship detection with recurrent attention and negative sampling

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Visual relationship detection with recurrent attention and negative sampling

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper