3.8 Proceedings Paper

Learning Cross-Modal Retrieval with Noisy Labels

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/CVPR46437.2021.00536

Keywords

-

Funding

  1. National Key R&D Program of China [2020YFB1406702]
  2. Fundamental Research Funds for the Central Universities [YJ201949]
  3. NFSC [61836006, U19A2078, U19A2081, 61625204]
  4. A*STAR under its AME Programmatic Funds [A1892b0026, A19E3b0099]

Ask authors/readers for more resources

This paper proposes a general Multimodal Robust Learning framework (MRL) to learn with multimodal noisy labels and mitigate noisy samples while correlating different modalities. The Robust Clustering loss (RC) is introduced to focus on clean samples instead of noisy ones, and the Multimodal Contrastive loss (MC) aims to maximize mutual information between different modalities to alleviate the interference of noisy samples and cross-modal discrepancy. Extensive experiments on four widely-used multimodal datasets show the effectiveness of the proposed approach compared to 14 state-of-the-art methods.
Recently; cross-modal retrieval is emerging with the help of deep multimodal learning. However, even for unimodal data, collecting large-scale well-annotated data is expensive and time-consuming, and not to mention the additional challenges from multiple modalities. Although crowd-sourcing annotation, e.g., Amazon's Mechanical Turk, can be utilized to mitigate the labeling cost, but leading to the unavoidable noise in labels for the non-expert annotating. To tackle the challenge, this paper presents a general Multimodal Robust Learning framework (MRL) for learning with multimodal noisy labels to mitigate noisy samples and correlate distinct modalities simultaneously. To be specific, we propose a Robust Clustering loss (RC) to make the deep networks focus on clean samples instead of noisy ones. Besides, a simple yet effective multimodal loss function, called Multimodal Contrastive loss (MC), is proposed to maximize the mutual information between different modalities, thus alleviating the interference of noisy samples and cross-modal discrepancy. Extensive experiments are conducted on four widely-used multimodal datasets to demonstrate the effectiveness of the proposed approach by comparing to 14 state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available