Abstract
Human communication is highly multimodal, including speech, gesture, gaze, facial expressions, and body language. Robots serving as human teammates must act on such multimodal communicative inputs from humans, even when the message may not be clear from any single modality. In this paper, we explore a method for achieving increased understanding of complex, situated communications by leveraging coordinated natural language, gesture, and context. These three problems have largely been treated separately, but unified consideration of them can yield gains in comprehension [1, 12].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Artzi, Y., Zettlemoyer, L.: UW SPF: The University of Washington Semantic Parsing Framework (2013)
Chen, Q., Georganas, N.D., Petriu, E.M.: Real-time vision-based hand gesture recognition using haar-like features. In: Instrumentation and Measurement Technology Conference Proceedings, IMTC 2007, pp. 1–6. IEEE, May 2007. doi:10.1109/IMTC.2007.379068
Eldon, M., Whitney, D., Tellex, S.: Interpreting Multimodal Referring Expressions in Real Time (2015). https://edge.edx.org/assetv1:Brown+CSCI2951-K+2015_T2+type@asset+block@eldon15.pdf
Escalera, S., et al.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368. ACM (2013)
Escalera, S., et al.: Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 445–452. ACM (2013)
Gawron, P., et al.: Eigengestures for natural human computer interface. arXiv:1105.1293 [cs] 103, pp. 49–56 (2011). doi:10.1007/978-3-642-23169-8_6, http://arxiv.org/abs/1105.1293. Accessed 29 Oct 2015
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008). ISSN: 0262-8856. doi:10.1016/j.imavis.2008.03.004, http://www.sciencedirect.com/science/article/pii/S0262885608000693. Accessed 18 Nov 2015
Guyon, I., et al.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014). ISSN 0932-8092, 1432-1769. doi:10.1007/s00138-014-0596-3, http://link.springer.com/article/10.1007/s00138-014-0596-3. Accessed 02 Mar 2016
Huang, C.-M., Mutlu, B.: Modeling and evaluating narrative gestures for humanlike robots. In: Robotics: Science and Systems (2013)
Jetley, S., et al.: Prototypical Priors: From Improving Classification to Zero- Shot Learning. arXiv:1512.01192 [cs] (3 December 2015). http://arxiv.org/abs/1512.01192. Accessed 29 Jan 2016
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)
Kollar, T., et al.: Generalized grounding graphs: a probabilistic framework for understanding grounded language. In: JAIR (2013). https://people.csail.mit.edu/sachih/home/wp-content/uploads/2014/04/G3_JAIR.pdf
Kondo, Y.: Body gesture classification based on bag-of-features in frequency domain of motion. In: 2012 IEEE RO-MAN, pp. 386–391 (2012). doi:10.1109/ROMAN.2012.6343783
Luo, D., Ohya, J.: Study on human gesture recognition from moving camera images. In: 2010 IEEE International Conference on Multimedia and Expo (ICME), pp. 274–279, July 2010. doi:10.1109/ICME.2010.5582998
Mikolov, T., et al.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (16 January 2013). arXiv:1301.3781, http://arxiv.org/abs/1301.3781. Accessed 30 Mar 2016
Palatucci, M., et al.: Zero-shot learning with semantic output codes. In: Neural Information Processing Systems (NIPS), December 2009
Sauppé, A., Mutlu, B.: Robot deictics: how gesture and context shape referential communication. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, New York, NY, USA, pp. 342–349. ACM (2014). ISBN 978-1-4503-2658-2. doi:10.1145/2559636.2559657, http://doi.acm.org/10.1145/2559636.2559657. Accessed 19 Nov 2015
Segers, V., Connan, J.: Real-time gesture recognition using eigenvectors (2009). http://www.cs.uwc.ac.za/~jconnan/publications/Paper%2056%20-%20Segers.pdf
Socher, R., et al.: Zero-Shot Learning Through Cross-Modal Transfer. arXiv: 1301.3666 [cs] (16 January 2013). arXiv:1301.3666, http://arxiv.org/abs/1301.3666. Accessed 25 Jan 2016
Takano, W., Hamano, S., Nakamura, Y.: Correlated space formation for human whole-body motion primitives and descriptive word labels. Rob. Auton. Syst. 66, 35–43 (2015)
Mahbub, U., Imtiaz, H.: One-Shot-Learning Gesture Recognition Using Motion History Based Gesture Silhouettes (2013). doi:10.12792/iciae2013
Wan, J., et al.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=2567709.2567743. Accessed 25 Jan 2016
Di, W., Zhu, F., Shao, L.: One shot learning gesture recognition from RGBD images. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 7–12. doi:10.1109/CVPRW.2012.6239179, June 2012
Wu, J.: Fusing multi-modal features for gesture recognition. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI 2013, New York, NY, USA, pp. 453–460. ACM (2013). ISBN 978-1-4503-2129-7. doi:10.1145/2522848.2532589, http://doi.acm.org/10.1145/2522848.2532589. Accessed 31 Mar 2016
Yin, Y., Davis, R.: Gesture spotting and recognition using salience detection and concatenated hidden Markov models. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI 2013, New York, NY, USA, pp. 489–494. ACM (2013). ISBN: 978-1-4503-2129-7. doi:10.1145/2522848.2532588, http://doi.acm.org/10.1145/2522848.2532588. Accessed 22 Jan 2016
Zhou, Y., et al.: Kernel-based sparse representation for gesture recognition. Pattern Recogn. 46(12), 3208–3222 (2013). ISSN 0031-3203. doi:10.1016/j.patcog.2013.06.007, http://dx.doi.org/10.1016/j.patcog.2013. Accessed 29 Jan 2016
Acknowledgements
This material is based upon research supported by the Office of Naval Research under Award Number N00014-16-1-2080. We are grateful for this support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Thomason, W., Knepper, R.A. (2017). Recognizing Unfamiliar Gestures for Human-Robot Interaction Through Zero-Shot Learning. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds) 2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-50115-4_73
Download citation
DOI: https://doi.org/10.1007/978-3-319-50115-4_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50114-7
Online ISBN: 978-3-319-50115-4
eBook Packages: EngineeringEngineering (R0)