☆ 4.6 Article

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

IEEE TRANSACTIONS ON CYBERNETICS (2020)

Journal

IEEE TRANSACTIONS ON CYBERNETICS

Volume 50, Issue 6, Pages 2400-2413

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2019.2928180

Keywords

Semantics; Correlation; Knowledge transfer; Standards; Task analysis; Training; Feature extraction; Adversarial learning; cross-modal retrieval; self-supervision; zero-shot learning (ZSL)

Funding

National Natural Science Foundation of China [61602089, 61632007, 61772116, 61572108]
Leading Initiative for Excellent Young Researcher of Ministry of Education, Culture, Sports, Science, and Technology, Japan [16809746]
Research Fund of the Telecommunications Advancement Foundation
Sichuan Science and Technology Program of China [2018GZDZX0032]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Given a query instance from one modality (e.g., image), cross-modal retrieval aims to find semantically similar instances from another modality (e.g., text). To perform cross-modal retrieval, existing approaches typically learn a common semantic space from a labeled source set and directly produce common representations in the learned space for the instances in a target set. These methods commonly require that the instances of both two sets share the same classes. Intuitively, they may not generalize well on a more practical scenario of zero-shot cross-modal retrieval, that is, the instances of the target set contain unseen classes that have inconsistent semantics with the seen classes in the source set. Inspired by zero-shot learning, we propose a novel model called ternary adversarial networks with self-supervision (TANSS) in this paper, to overcome the limitation of the existing methods on this challenging task. Our TANSS approach consists of three paralleled subnetworks: 1) two semantic feature learning subnetworks that capture the intrinsic data structures of different modalities and preserve the modality relationships via semantic features in the common semantic space; 2) a self-supervised semantic subnetwork that leverages the word vectors of both seen and unseen labels as guidance to supervise the semantic feature learning and enhances the knowledge transfer to unseen labels; and 3) we also utilize the adversarial learning scheme in our TANSS to maximize the consistency and correlation of the semantic features between different modalities. The three subnetworks are integrated in our TANSS to formulate an end-to-end network architecture which enables efficient iterative parameter optimization. Comprehensive experiments on three cross-modal datasets show the effectiveness of our TANSS approach compared with the state-of-the-art methods for zero-shot cross-modal retrieval.

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Journal

IEEE TRANSACTIONS ON CYBERNETICS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval

Journal

IEEE TRANSACTIONS ON CYBERNETICS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper