☆ 4.7 Article

Saliency Inside: Learning Attentive CNNs for Content-Based Image Retrieval

IEEE TRANSACTIONS ON IMAGE PROCESSING (2019)

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume 28, Issue 9, Pages 4580-4593

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2019.2913513

Keywords

Visual saliency; content-based image retrieval; bag-of-word; convolutional neural networks

Funding

National Key Research and Development of China [2017YFC1703503]
National Natural Science Foundation of China [61572065, 61532005, 61672072]
Beijing Nova Program [Z181100006218063]
Fundamental Research Funds for the Central Universities [2018JBZ001]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In content-based image retrieval (CBIR), one of the most challenging and ambiguous tasks is to correctly understand the human query intention and measure its semantic relevance with images in the database. Due to the impressive capability of visual saliency in predicting human visual attention that is closely related to the query intention, this paper attempts to explicitly discover the essential effect of visual saliency in CBIR via qualitative and quantitative experiments. Toward this end, we first generate the fixation density maps of images from a widely used CBIR dataset by using an eye-tracking apparatus. These ground-truth saliency maps are then used to measure the influence of visual saliency to the task of CBIR by exploring several probable ways of incorporating such saliency cues into the retrieval process. We find that visual saliency is indeed beneficial to the CBIR task, and the best saliency involving scheme is possibly different for different image retrieval models. Inspired by the findings, this paper presents two-stream attentive convolutional neural networks (CNNs) with saliency embedded inside for CBIR. The proposed network has two streams that simultaneously handle two tasks. The main stream focuses on extracting discriminative visual features that are tightly related to semantic attributes. Meanwhile, the auxiliary stream aims to facilitate the main stream by redirecting the feature extraction to the salient image content that a human may pay attention to. By fusing these two streams into the Main and Auxiliary CNNs (MAC), image similarity can be computed as the human being does by reserving conspicuous content and suppressing irrelevant regions. Extensive experiments show that the proposed model achieves impressive performance in image retrieval on four public datasets.

Saliency Inside: Learning Attentive CNNs for Content-Based Image Retrieval

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Saliency Inside: Learning Attentive CNNs for Content-Based Image Retrieval

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper