Context-Aware GANs for Image Generation from Multimodal Queries

Nakamura, Kenki; Ma, Qiang

doi:10.1007/978-3-030-27615-7_33

Context-Aware GANs for Image Generation from Multimodal Queries

Kenki Nakamura¹⁴ &
Qiang Ma¹⁴

Conference paper
First Online: 03 August 2019

1455 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Abstract

In this paper, we propose a novel model of context-aware generative adversarial networks (GANs) to generate images from a multimodal query: a pair of condition text and context image. In our study, context is defined as the objects and concepts that appear in the image but not in the text. We construct two object trees expressing the objects and the corresponding hierarchical relationships described in the input condition text and context image, respectively. We compare these two object trees to extract the context. Then, based on the extracted context, we generate parameters for the generator in context-aware GANs. To guarantee that the generated image is related to the multimodal query, i.e., both the condition text and context image, we also construct a context discriminator in addition to the condition discriminator, similar to that of conditional GANs. The experimental results reveal that the prepared model generates images with higher resolutions, containing more contextual information than previous models.

This work is partly supported by MIC SCOPE(172307001).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For details of these relationships, please refer to [2].

References

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015)
Google Scholar
Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. CoRR abs/1710.10196 (2017). http://arxiv.org/abs/1710.10196
Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019)
Google Scholar
Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017)
Google Scholar
Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)
Article Google Scholar
Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). http://arxiv.org/abs/1511.06434
Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016)
Google Scholar
Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016)
Google Scholar
Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Kenki Nakamura & Qiang Ma

Authors

Kenki Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiang Ma .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
The University of Texas at Arlington, Arlington, TX, USA
Sharma Chakravarthy
Johannes Kepler University of Linz, Linz, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakamura, K., Ma, Q. (2019). Context-Aware GANs for Image Generation from Multimodal Queries. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-27615-7_33
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27614-0
Online ISBN: 978-3-030-27615-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics