☆ 4.7 Article

From External to Internal: Structuring Image for Text-to-Image Attributes Manipulation

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 25, Issue -, Pages 7248-7261

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2022.3219677

Keywords

Task analysis; Natural languages; Image representation; Image color analysis; Generators; Generative adversarial networks; Visualization; GANs; image translation; text-to-image attribute manipulation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

SIMGAN is a novel GAN-based approach for manipulating image attributes through natural language descriptions. It consists of External Structuring (ExST) and Internal Structuring (InST) components, which significantly outperform existing methods both quantitatively and qualitatively in experiments. The efficiency and accuracy of editing text-relevant image attributes are greatly improved compared to state-of-the-art methods.

Manipulating visual attributes of an image through a natural language description, known as text-to-image attributes manipulation (T2AM), is a challenging task. However, existing approaches tend to search the whole image to manipulate the target instance indicated by a description, thus they often fail to locate and manipulate the accurate text-relevant regions, and even disturb the text-irrelevant contents, e.g. texture and background. Meanwhile, the model efficiency needs to be improved. To tackle the above issues, we introduce a novel yet simple GAN-based approach, namely Structuring Image for Manipulating (SIMGAN), to narrow down the optimization areas from external to internal. It consists of two major components: 1) External Structuring (ExST), a pretrained segmentation network, for recognizing and separating the target instances and background from an image; and 2) Internal Structuring (InST) for seeking out and editing the text-relevant attributes of the target instances based on the given description and masked hierarchical image representations from ExST. Specifically, the InST structures target instances from outline to detail by firstly drawing the sketch and colors underpainting of instances with an Outline-Oriented Structuring (OuST), and then enhancing the text-relevant attributes and elaborating on details with a Detail-Oriented Structuring (DeST). Extensive experiments on benchmark datasets demonstrate that our framework significantly outperforms state-of-the-art both quantitatively and qualitatively. Compared with the state-of-the-art method ManiGAN, our approach reduces the training time by 88%, while the inferring time is three times faster. In addition, our approach is easily extended to solve the instance-level image-to-image translation problem, and the results exhibit the versatility and effectiveness of our approach. This code is released in https://github.com/qikizh/SIMGAN.

From External to Internal: Structuring Image for Text-to-Image Attributes Manipulation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

From External to Internal: Structuring Image for Text-to-Image Attributes Manipulation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper