☆ 4.6 Article

Multi-scale motivated neural network for image-text matching

MULTIMEDIA TOOLS AND APPLICATIONS (2023)

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Volume -, Issue -, Pages -

Publisher

SPRINGER

DOI: 10.1007/s11042-023-15321-0

Keywords

Image-text matching; Multi-scale information; Cross-modal interaction; Matching score fusion algorithm

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a Multi-Scale Motivated Neural Network (MSMNN) model for image-text matching. The model extracts visual and textual features from three scales and utilizes a cross-modal interaction module to discover the potential relationship between image-text pairs. Furthermore, a matching score fusion algorithm is proposed to fuse matching results from three different levels. Extensive experiments show the effectiveness of the method, achieving competitive results on two well-known datasets.

Existing mainstream image-text matching methods usually measure the relevance of image-text pairs by capturing and aggregating the affinities between textual words and visual regions, while failing to consider the single-scale matching bias caused by the imbalance of image and text information. In this paper, we design a Multi-Scale Motivated Neural Network (MSMNN) model for image-text matching. In contrast to previous single-scale methods, MSMNN encourages neural networks to extract visual and textual features from three scales, including local features, global features and salient features, which can take full advantage of the complementarity of multi-scale matching to reduce the bias of single-scale matching. Also, we propose a cross-modal interaction module to realize the fusion of visual and textual features in local alignment, so as to discover the potential relationship between image-text pairs. Furthermore, we also propose a matching score fusion algorithm to fuse matching results from three different levels, which can be freely applied to other initial image-text matching results with a negligible overhead. Extensive experiments validate the effectiveness of our method, and the performance has achieved fairly competitive results on two well-known datasets, Flickr30K and MSCOCO, with a boost of 1.04% and 0.59% on evaluation metric mR compared with the advanced method.

Multi-scale motivated neural network for image-text matching

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Multi-scale motivated neural network for image-text matching

Journal

MULTIMEDIA TOOLS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper