Skip to main content

Latent-Space Data Augmentation for Visually-Grounded Language Understanding

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence (JSAI 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1128))

Included in the following conference series:

  • 520 Accesses

Abstract

This is an extension from a selected paper from JSAI2019. In this paper, we study data augmentation for visually-grounded language understanding in the context of picking task. A typical picking task consists of predicting a target object specified by an ambiguous instruction ,e.g., “Pick up the yellow toy near the bottle”. We specifically show that existing methods for understanding such an instruction can be improved by data augmentation. More explicitly, MCTM [1] and MTCM-GAN [2] show better results with data augmentation when specifically considering latent space features instead of raw features. Additionally our results show that latent-space data augmentation can improve better a network accuracy than regularization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Magassouba, A., Sugiura, K., Kawai, H.: A multimodal target-source classifier model for object fetching from natural language instructions. In: Proceedings of the National Congress of the Japanese Society for Artificial Intelligence, pp. 2D3E403–2D3E403 (2019). (in Japanese)

    Google Scholar 

  2. Magassouba, A., Sugiura, K., Trinh Quoc, A., Kawai, H.: Understanding natural language instructions for fetching daily objects using GAN-based multimodal target-source classification. IEEE RA-L 4(4), 3884–3891 (2019)

    Google Scholar 

  3. Iocchi, L., Holz, D., Ruiz-del Solar, J., Sugiura, K., Van Der Zant, T.: RoboCup@ Home: analysis and results of evolving competitions for domestic and service robots. Artif. Intell. 229, 258–281 (2015)

    Article  MathSciNet  Google Scholar 

  4. Yu, L., Tan, H., Bansal, M., Berg, T.L.: A joint speaker listener-reinforcer model for referring expressions. In: CVPR, vol. 2 (2017)

    Google Scholar 

  5. Magassouba, A., Sugiura, K., Kawai, H.: A multimodal classifier generative adversarial network for carry and place tasks from ambiguous language instructions. IEEE RA-L 3(4), 3113–3120 (2018)

    Google Scholar 

  6. Magassouba, A., Sugiura, K., Kawai, H.: Multimodal attention branch network for perspective-free sentence generation. In: Conference on Robot Learning (CoRL) (2019)

    Google Scholar 

  7. Cohen, V., Burchfiel, B., Nguyen, T., Gopalan, N., Tellex, S., Konidaris, G.: Grounding language attributes to objects using bayesian eigenobjects. In: Proceedings IEEE IROS 2019 (2019)

    Google Scholar 

  8. Nagaraja,V.K., Morariu, V.I., Davis, L.S.: Modeling context between objects for referring expression understanding. In: ECCV, pp. 792–807 (2016)

    Google Scholar 

  9. Hatori, J., et al.: Interactively picking real-world objects with unconstrained spoken language instructions. In: IEEE ICRA, pp. 3774–3781 (2018)

    Google Scholar 

  10. Shridhar, M., Hsu, D.: Interactive visual grounding of referring expressions for human-robot interaction. In: RSS (2018)

    Google Scholar 

  11. Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: Proceedings ICLR 2015 (2015)

    Google Scholar 

  12. Sugiura, K., Kawai, H.: Grounded language understanding for manipulation instructions using GAN-based classification. In: IEEE ASRU (2017)

    Google Scholar 

  13. Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. In: ICML, pp. 2642–2651 (2017)

    Google Scholar 

  14. Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sünderhauf, N., Reid, I., Gould, S., van den Hengel, A.: Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In: ECCV, pp. 3674–3683 (2018)

    Google Scholar 

  15. Bousmalis, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: Proceedings of IEEE ICRA, pp. 4243–4250 (2018)

    Google Scholar 

  16. Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: learning agile locomotion for quadruped robots. In: RSS (2018)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings ICLR 2015 (2014)

    Google Scholar 

  18. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, pp. 4171–4186 (2019)

    Google Scholar 

  19. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, C.: Improved training of wasserstein GANs. In: NIPS, pp. 5769–5779 (2017)

    Google Scholar 

  20. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by JST CREST, SCOPE and NEDO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aly Magassouba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Magassouba, A., Sugiura, K., Kawai, H. (2020). Latent-Space Data Augmentation for Visually-Grounded Language Understanding. In: Ohsawa, Y., et al. Advances in Artificial Intelligence. JSAI 2019. Advances in Intelligent Systems and Computing, vol 1128. Springer, Cham. https://doi.org/10.1007/978-3-030-39878-1_17

Download citation

Publish with us

Policies and ethics