☆ 4.7 Article

PFAN plus plus : Bi-Directional Image-Text Retrieval With Position Focused Attention Network

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 23, Issue -, Pages 3362-3376

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2020.3024822

Keywords

Visualization; Semantics; Task analysis; Reliability; Postal services; Feature extraction; Fans; Image-text matching; attention mechanism; cross-domain; position embedding learning

Funding

NSFC [61772407, 61732008]
National Key Research and Development [2019YFB2102500]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The proposed method in this paper introduces a novel position focusing attention network to investigate the relation between visual image and textual views, enhancing the joint-embedding learning by integrating object positions and a position attention mechanism. Experiments conducted on Flickr30K, MS-COCO, and Tencent-News datasets have shown competitive performance of the proposed method.

Bi-directional image-text retrieval and matching attract much attention recently. This cross-domain task demands a fine understanding of both modalities for learning a measure of different modality data. In this paper, we propose a novel position focused attention network to investigate the relation between the visual and the textual views. This work integrates the prior object position to enhance the visual-text joint-embedding learning. The image is first split into blocks, which are treated as the basic position cells, and the position of an image region is inferred. Then, we propose a position attention to model the relations between the image region and position cells. Finally, we generate a valuable position feature to further enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical large-scale news dataset (Tencent-News) to validate the practical application value of the proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method achieves the competitive performance on all of these three datasets.

PFAN plus plus : Bi-Directional Image-Text Retrieval With Position Focused Attention Network

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PFAN plus plus : Bi-Directional Image-Text Retrieval With Position Focused Attention Network

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper