4.7 Article

Semantic Example Guided Image-to-Image Translation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 23, Issue -, Pages 1654-1665

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.3001536

Keywords

Training; Visualization; Semantics; Artificial neural networks; Generators; Artificial neural networks; image generation; image representation

Funding

  1. National Natural Science Foundation of China [61672443]
  2. Hong Kong GRF-RGC General Research Fund [9042322 (CityU 11200116), 9042489 (CityU 11206317), 9042816 (CityU 11209819)]
  3. Hong Kong ECS [21209119]
  4. Hong Kong UGC
  5. CityU of Hong Kong [7200607]

Ask authors/readers for more resources

This paper proposes an image-to-image translation method that controls output based on semantics, generates auxiliary images to encourage semantic feature preservation, and establishes a self-supervised framework in the training stage. By employing non-local blocks and a multi-task architecture, the quality and diversity of the outputs are improved successfully.
Many image-to-image (I2I) translation problems are in nature of high diversity that a single input may have various counterparts. The multi-modal network that can build a many-to-many mapping between two visual domains has been proposed in prior works. However, most of them are guided by sampled noises. Some others encode the reference image into a latent vector, which would eliminate the semantic information of the reference image. In this work, we aim to provide a solution to control the output based on references semantically. Given a reference image and an input in another domain, we first perform semantic matching between the two visual content and generate an auxiliary image, which explicitly encourages the semantic characteristic to be preserved. A deep network then is used for I2I translation and the final outputs are expected to be semantically similar to both the input and the reference. However, few paired data can satisfy that dual-similarity in a supervised fashion, and so we build up a self-supervised framework in the training stage. We improve the quality and diversity of the outputs by employing non-local blocks and a multi-task architecture. We assess the proposed method through extensive qualitative and quantitative evaluations and also present comparisons with several state-of-the-art models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available