research-article

Text Pared into Scene Graph for Diverse Image Generation

Authors:

Wenjun ZhangAuthors Info & Claims

CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering

Article No.: 83, Pages 1 - 5

https://doi.org/10.1145/3487075.3487158

Published: 07 December 2021 Publication History

Abstract

Although significant recent advances in condition generative model have shown remarkable improvements for controlled image generation, the image generation for multiple complex objects is still a challenge. To address the challenge, we propose a module of text description parsed into scene graph, which can generate reasonable scene layout to ensure the generated image and object realistic. Our proposed method enhances the interaction between objects and global semantics by concatenates each object embedding with text embedding To preserve the local image semantics, the Spatially-adaptive normalization(SPADE) layer is added into the generator of our model. We validate our method on Visual Genome and COCO-Stuff, where qualitative results and ablation study demonstrate the ability of our model in generating images with multiple objects and complex relationships.

References

[1]

Reed S, Akata Z, Yan X, (2016). Generative adversarial text to image synthesis[C]//International Conference on Machine Learning. PMLR, 1060-1069.

[2]

Johnson J, Gupta A, Fei-Fei L (2018). Image generation from scene graphs[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 1219-1228.

[3]

Li Y, Ma T, Bai Y, (2019). Pastegan: A semi-parametric method to generate image from scene graph[J]. Advances in Neural Information Processing Systems, 32: 3948-3958.

[4]

Ashual O, Wolf L (2019). Specifying object attributes and relations in interactive scene generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 4561-4569.

[5]

Zhao B, Meng L, Yin W, (2019). Image generation from layout[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8584-8593.

[6]

Sun W, Wu T (2019). Image synthesis from reconfigurable layout and style[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 10531-10540.

[7]

Sylvain T, Zhang P, Bengio Y, (2020). Object-centric image generation from layouts[J]. arXiv preprint arXiv:2003.07449, 1(2): 4.

[8]

Tan F, Feng S, Ordonez V (2019). Text2scene: Generating compositional scenes from textual descriptions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6710-6719

[9]

Mikolov T, Sutskever I, Chen K, (2013). Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems, 3111-3119.

[10]

Lee K H, Palangi H, Chen X, (2019). Learning visual relation priors for image-text matching and image captioning with neural scene graph generators[J]. arXiv preprint arXiv:1909. 09953.

[11]

Li Y, Ouyang W, Zhou B, (2017). Scene graph generation from objects, phrases and region captions[C]//Proceedings of the IEEE international conference on computer vision, 1261-1270.

[12]

Cha M, Gwon Y L, Kung H T (2019). Adversarial learning of semantic relevance in text to image synthesis[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 33(01): 3272-3279.

[13]

Schuster S, Krishna R, Chang A, (2015). Generating semantically precise scene graphs from textual descriptions for improved image retrieval[C]//Proceedings of the fourth workshop on vision and language, 70-80.

[14]

Goodfellow I, Pouget-Abadie J, Mirza M, (2014). Generative adversarial nets[J]. Advances in neural information processing systems, 27.

[15]

Mirza M, Osindero S (2014). Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411. 1784.

[16]

Reed S E, Akata Z, Mohan S, (2016). Learning what and where to draw[J]. Advances in neural information processing systems, 29: 217-225.

[17]

Zhang H, Xu T, Li H, (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE international conference on computer vision, 5907-5915.

[18]

Zhang H, Xu T, Li H, (2018). Stackgan++: Realistic image synthesis with stacked generative adversarial networks[J]. IEEE transactions on pattern analysis and machine intelligence, 41(8): 1947-1962.

[19]

Xu T, Zhang P, Huang Q, (2018). Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 1316-1324.

[20]

Zhang Z, Xie Y, Yang L (2018). Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6199-6208.

[21]

Odena A, Olah C, Shlens J (2017). Conditional image synthesis with auxiliary classifier gans[C]//International conference on machine learning. PMLR, 2642-2651.

[22]

Anderson P, Fernando B, Johnson M, (2016). Spice: Semantic propositional image caption evaluation[C]//European conference on computer vision. Springer, Cham, 382-398.

[23]

Park T, Liu M Y, Wang T C, (2019). Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2337-2346.

[24]

Heusel M, Ramsauer H, Unterthiner T, (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in neural information processing systems, 30.

[25]

Isola P, Zhu J Y, Zhou T, (2017). Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 1125-1134.

[26]

Johnson J, Alahi A, Fei-Fei L (2016). Perceptual losses for real-time style transfer and super-resolution[C]//European conference on computer vision. Springer, Cham, 694-711.

[27]

Lin T Y, Maire M, Belongie S, (2014). Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 740-755.

[28]

Krishna R, Zhu Y, Groth O, (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations[J]. International journal of computer vision, 123(1): 32-73.

Digital Library

[29]

Salimans T, Goodfellow I, Zaremba W, (2016). Improved techniques for training gans[J]. Advances in neural information processing systems, 29: 2234-2242

[30]

Simonyan K, Zisserman A. (2014). Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556.

[31]

Zhang R, Isola P, Efros A A, (2018). The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE conference on computer vision and pattern recognition, 586-595.

Index Terms

Text Pared into Scene Graph for Diverse Image Generation
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics

Index terms have been assigned to the content through auto-classification.

Recommendations

Part-Aware Interactive Learning for Scene Graph Generation
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Generating scene graph to describe the whereabouts and interactions of objects in an image has attracted increasing attention of researchers. Most existing methods explore object-level visual context or bodypart-object cooperation with the message ...
Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges
Abstract
Generating realistic images is one of the important problems in the field of computer vision. Generating consistent images with a user’s input is called conditional image generation. Due to recent advances in generating high-quality images with ...
Scene graph generation by multi-level semantic tasks
Abstract
Understanding scene image includes detecting and recognizing objects, estimating the interaction relationships of the detected objects, and describing image regions with sentences. However, since the complexity and variety of scene image, existing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering

October 2021

660 pages

ISBN:9781450389853

DOI:10.1145/3487075

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key Research and Development Plan of China

Conference

CSAE 2021

CSAE 2021: The 5th International Conference on Computer Science and Application Engineering

October 19 - 21, 2021

Sanya, China

Acceptance Rates

Overall Acceptance Rate 368 of 770 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
165
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten