☆ 4.7 Article

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis From Constrained Prior Knowledge

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 32, Issue 8, Pages 5187-5200

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2021.3136857

Keywords

Birds; Semantics; Visualization; Feature extraction; Image color analysis; Generators; Knowledge based systems; Text-to-image synthesis; multiple captions; prior knowledge

Funding

National Natural Science Foundation of China [U21A20487, U1913202, U1813205]
Shenzhen Technology Project [JCYJ20200109113416531, JCYJ20180507182610734, JCYJ20180302145648171]
CAS Key Technology Talent Program

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Text-to-image synthesis is a challenging task, and this paper proposes a novel generation method called RiFeGAN2 to address this problem. By utilizing a constrained model and attention mechanism to optimize compatible candidate captions, the proposed method aims to improve the quality and semantic consistency of the generated images.

Text-to-image synthesis is a challenging task that generates realistic images from a textual description. The description contains limited information compared with the corresponding image and is ambiguous and abstract, which will complicate the generation and lead to low-quality images. To address this problem, we propose a novel generation text-to-image synthesis method, called RiFeGAN2, to enrich the given description. To improve the enrichment quality while accelerating the enrichment process, RiFeGAN2 exploits a domain-specific constrained model to limit the search scope and then uses an attention-based caption matching model to refine the compatible candidate captions based on constrained prior knowledge. To improve the semantic consistency between the given description and the synthesized results, RiFeGAN2 employs improved SAEMs, SAEM2s, to compact better features of the retrieved captions and effectively emphasize the descriptions via incorporating centre-attention layers. Finally, multi-caption attentional GANs are exploited to synthesize images from those features. Experiments performed on widely-used datasets show that the models can generate vivid images from enriched captions and effectually improve the semantic consistency.

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis From Constrained Prior Knowledge

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis From Constrained Prior Knowledge

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper