4.6 Article

A survey on generative adversarial network-based text-to-image synthesis

Journal

NEUROCOMPUTING
Volume 451, Issue -, Pages 316-336

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.04.069

Keywords

Deep learning; Generative adversarial network (GAN); Text-to-image synthesis; Scene layout

Funding

  1. National Key Research and Development Plan of China [2020AAA0108903, 2017YFB1300205]
  2. National Natural Science Foundation of China [61573213, 61803227, 61603214, 61673245]
  3. Natural Science Foundation of Shandong Province [2018GGX101039, ZR2020MD041, ZR2020MF077]

Ask authors/readers for more resources

Text-to-image synthesis is a new challenge in the field of image synthesis. With the development of deep learning and the application of GANs, significant progress has been achieved in this area. The input of GANs-based text-to-image synthesis includes general text description, scene layout, and dialog text, with a focus on improving text information utilization, network structure, and output control conditions.
The task of text-to-image synthesis is a new challenge in the field of image synthesis. In the earlier research, the task of text-to-image synthesis is mainly to achieve the alignment of words and images by the way of retrieval based on the sentences or keywords. With the development of deep learning, especially the application of deep generative models in image synthesis, image synthesis achieves promising progress. The Generative adversarial networks (GANs) are one of the most significant generative models, and GANs have been successfully applied in computer vision, natural language processing and so on. In this paper, we review and summarize the recent research in GANs-based text-to-image synthesis, and provide a summary of the development of classic and advanced models. The input of the GANs-based text-to image synthesis is not only the general text description as earlier studies, also includes scene layout and dialog text. The typical structure of each categories is elaborated. The general text-based image synthesis is the most commonly in the text-to-image synthesis, and it is subdivided into three groups based on the improvements of text information utilization, network structure and output control conditions. Through the survey, the detailed and logical overview of the evolution of GANs-based text-to-image synthesis is presented. Finally, the challenged problems and the future development of text-to-image synthesis are discussed. (c) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available