4.6 Article

Multi-view inter-modality representation with progressive fusion for image-text matching

Journal

NEUROCOMPUTING
Volume 535, Issue -, Pages 1-12

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2023.02.043

Keywords

Image -text matching; Cross modal matching; Multi view

Ask authors/readers for more resources

This paper proposes a novel multi-view inter-modality representation method to enhance the relationship between image and text by exploring multi-view features. The multi-view strategy provides more complementary semantic clues compared to single-view approaches.
Recently, image-text matching has been intensively explored to bridge vision and language. Previous methods explore an inter-modality relationship between an image-text pair from the single-view feature. However, it is difficult to discover all the abundant information based on a single inter-modality relation-ship. In this paper, a novel Multi-View Inter-Modality Representation with Progressive Fusion (MIRPF) is developed to explore inter-modality relationships from multi-view features. The multi-view strategy provides more complementary and global semantic clues than single-view approaches. In particular, the multi-view inter-modality representation network is constructed to generate multiple inter -modality representations, which provide diverse views to discover the latent image-text relationships. Furthermore, the progressive fusion module is performed to fuse inter-modality features stepwise, which fully uses the inherent complementary between different views. Extensive experiments on Flickr30K and MSCOCO verify the superiority of MIRPF compared with several existing approaches. The code is available at: https://github.com/jasscia18/MIRPF. (C) 2023 Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available