4.6 Article

Development and deployment of a generative model-based framework for text to photorealistic image generation

Journal

NEUROCOMPUTING
Volume 463, Issue -, Pages 1-16

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.08.055

Keywords

Text-to-image; Text-to-face; Face synthesis; GAN; AttnGAN

Ask authors/readers for more resources

The task of generating photorealistic images from textual descriptions is challenging, especially for face images. AttnGAN is proposed for fine-grained text-to-face generation, showing higher quality images compared to existing methods. The approach is evaluated on the CelebA dataset using FID score and has potential applications in criminal identification and can be deployed on standalone devices like Raspberry Pi for portability.
The task of generating photorealistic images from their textual descriptions is quite challenging. Most existing tasks in this domain are focused on the generation of images such as flowers or birds from their textual description, especially for validating the generative models based on Generative Adversarial Network (GAN) variants and for recreational purposes. However, such work is limited in the domain of photorealistic face image generation and the results obtained have not been satisfactory. This is partly due to the absence of concrete data in this domain and a large number of highly specific features/attributes involved in face generation compared to birds or flowers. In this paper, we propose an Attention Generative Adversarial Network (AttnGAN) for a fine-grained text-to-face generation that enables attention-driven multi-stage refinement by employing Deep Attentional Multimodal Similarity Model (DAMSM). Through extensive experimentation on the CelebA dataset, we evaluated our approach using the Frechet Inception Distance (FID) score. The output files for the Face2Text Dataset are also compare with that of the T2F Github project. According to the visual comparison, AttnGAN generated higher quality images than T2F. Additionally, we compare our methodology with existing approaches with a specific focus on CelebA dataset and demonstrate that our approach generates a better FID score facilitating more realistic image generation. The application of such an approach can be found in criminal identification, where faces are generated from the textual description from an eyewitness. Such a method can bring consistency and eliminate the individual biases of an artist drawing the faces from the description given by the eyewitness. Finally, we discuss the deployment of the models on a Raspberry Pi to test how effective the models would be on a standalone device to facilitate portability and timely task completion. (c) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available