4.7 Article

Region Reinforcement Network With Topic Constraint for Image-Text Matching

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2021.3060713

Keywords

Semantics; Visualization; Electronic mail; Petroleum; Linear programming; Cameras; Training data; Image-text matching; cross-modal retrieval; topic constraint

Funding

  1. Key Research and Development Plan of Shandong Province [2019GGX101015]
  2. National Natural Science Foundation of China [61671482]
  3. Fundamental Research Funds for the Central Universities [19CX05003A11]
  4. China National Study Abroad Fund

Ask authors/readers for more resources

Image and sentence matching, which combines vision and language, has gained increasing attention. Previous methods ignored the relationships between image regions and considered all region-word pairs equally. This paper proposes a novel method, the Region Reinforcement Network with Topic Constraint (RRTC), to explore the correspondences between images and texts. It builds a region reinforcement network to infer fine-grained correspondence by considering the relationships of regions and re-assigning region-word similarities. The topic constraint module summarizes the central theme of images and constrains the deviation of the original image.
Image and sentence matching has attracted increasing attention since it is associated with two important modalities of vision and language. Previous methods aim to find the latent correspondences between image regions and words by aggregating the similarities of the region-word pairs. However, these approaches consider little about the relationships of diverse regions in the image and treat the similarities of all region-word pairs equally. Moreover, focusing on fine-grained alignment overly, the true meaning of the original image will be likely distorted. In this paper, a novel Region Reinforcement Network with Topic Constraint (RRTC) is proposed to explore the correspondences between images and texts. Specifically, the region reinforcement network is built to infer fine-grained correspondence by considering the relationships of regions and re-assigning region-word similarities. Meanwhile, the topic constraint module is presented to summarize the central theme of images, which constrains the original image deviation. Extensive experimental results on MSCOCO and Flickr30k datasets verify the effectiveness of our proposed RRTC.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available