Skip to main content

Context-Aware GANs for Image Generation from Multimodal Queries

  • Conference paper
  • First Online:
  • 1455 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Abstract

In this paper, we propose a novel model of context-aware generative adversarial networks (GANs) to generate images from a multimodal query: a pair of condition text and context image. In our study, context is defined as the objects and concepts that appear in the image but not in the text. We construct two object trees expressing the objects and the corresponding hierarchical relationships described in the input condition text and context image, respectively. We compare these two object trees to extract the context. Then, based on the extracted context, we generate parameters for the generator in context-aware GANs. To guarantee that the generated image is related to the multimodal query, i.e., both the condition text and context image, we also construct a context discriminator in addition to the condition discriminator, similar to that of conditional GANs. The experimental results reveal that the prepared model generates images with higher resolutions, containing more contextual information than previous models.

This work is partly supported by MIC SCOPE(172307001).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For details of these relationships, please refer to  [2].

References

  1. Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015)

    Google Scholar 

  2. Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014)

    Google Scholar 

  3. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)

    Google Scholar 

  4. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  5. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. CoRR abs/1710.10196 (2017). http://arxiv.org/abs/1710.10196

  6. Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019)

    Google Scholar 

  7. Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017)

    Google Scholar 

  8. Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)

    Article  Google Scholar 

  9. Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017)

    Google Scholar 

  10. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014)

    Google Scholar 

  11. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). http://arxiv.org/abs/1511.06434

  12. Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016)

    Google Scholar 

  13. Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017)

    Google Scholar 

  14. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016)

    Google Scholar 

  15. Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakamura, K., Ma, Q. (2019). Context-Aware GANs for Image Generation from Multimodal Queries. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27615-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27614-0

  • Online ISBN: 978-3-030-27615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics