skip to main content
10.1145/3371382.3378265acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
abstract

Multimodal Representation Learning for Human Robot Interaction

Published: 01 April 2020 Publication History

Abstract

We present a neural network based system capable of learning a multimodal representation of images and words. This representation allows for bidirectional grounding of the meaning of words and the visual attributes that they represent, such as colour, size and object name. We also present a new dataset captured specifically for this task.

References

[1]
Broz, F., Nehaniv, C. L., Belpaeme, T., Bisio, A., Dautenhahn, K., Fadiga, L., Ferrauto, T., Fischer, K., Förster, F., Gigliotta, O., et al. The italk project: A developmental robotics approach to the study of individual, social, and linguistic learning. Topics in cognitive science 6, 3 (2014), 534--544.
[2]
Cangelosi, A., Belpaeme, T., Sandini, G., Metta, G., Fadiga, L., Sagerer, G., Rohlfing, K.,Wrede, B., Nolfi, S., Parisi, D., et al. The italk project: Integration and transfer of action and language knowledge in robots. In Proceedings of Third ACM/IEEE International Conference on Human Robot Interaction (HRI 2008) (2008), vol. 12, p. 15.
[3]
Keller, I., and Lohan, K. S. Analysis of illumination robustness in long-term object learning. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) (2016), IEEE, pp. 240--245.
[4]
Keller, I., and Lohan, K. S. On the Illumination Influence for Object Learning on Robot Companions. Frontiers in Robotics and AI,(in press) (2019), 1--17.
[5]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (2013), pp. 3111--3119.
[6]
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., and Ng, A. Y. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11) (2011), pp. 689--696.
[7]
Schillingmann, L., Wrede, B., and Rohlfing, K. Towards a computational model of acoustic packaging. In Development and Learning, 2009. ICDL 2009. IEEE 8th International Conference on (2009), IEEE, pp. 1--6.
[8]
Schillingmann, L., Wrede, B., and Rohlfing, K. J. A computational model of acoustic packaging. IEEE Transactions on Autonomous Mental Development 1, 4 (2009), 226--237.
[9]
Sheppard, E., Lehmann, H., Rajendran, G., McKenna, P. E., Lemon, O., and Lohan, K. S. Towards life long learning: Multimodal learning of mnist handwritten digits. IEEE ICDL EPIROB 2018 Workshop on Life Long Learning (2018).
[10]
Silberer, C., and Lapata, M. Learning grounded meaning representations with autoencoders. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2014), vol. 1, pp. 721--732.
[11]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958.

Cited By

View all
  • (2022)Vector Semiotic Model for Visual Question AnsweringCognitive Systems Research10.1016/j.cogsys.2021.09.00171:C(52-63)Online publication date: 1-Jan-2022
  • (2020)An Improved Deep Canonical Correlation Fusion Method for Underwater Multisource DataIEEE Access10.1109/ACCESS.2020.30144958(146300-146307)Online publication date: 2020
  • (undefined)IMPRINT: Interactional Dynamics-aware Motion Prediction in Teams using Multimodal ContextACM Transactions on Human-Robot Interaction10.1145/3626954

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction
March 2020
702 pages
ISBN:9781450370578
DOI:10.1145/3371382
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2020

Check for updates

Author Tags

  1. datasets
  2. neural networks
  3. robotics
  4. symbol grounding
  5. unsupervised learning

Qualifiers

  • Abstract

Funding Sources

Conference

HRI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 192 of 519 submissions, 37%

Upcoming Conference

HRI '25
ACM/IEEE International Conference on Human-Robot Interaction
March 4 - 6, 2025
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Vector Semiotic Model for Visual Question AnsweringCognitive Systems Research10.1016/j.cogsys.2021.09.00171:C(52-63)Online publication date: 1-Jan-2022
  • (2020)An Improved Deep Canonical Correlation Fusion Method for Underwater Multisource DataIEEE Access10.1109/ACCESS.2020.30144958(146300-146307)Online publication date: 2020
  • (undefined)IMPRINT: Interactional Dynamics-aware Motion Prediction in Teams using Multimodal ContextACM Transactions on Human-Robot Interaction10.1145/3626954

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media