Recognizing Unfamiliar Gestures for Human-Robot Interaction Through Zero-Shot Learning

Thomason, Wil; Knepper, Ross A.

doi:10.1007/978-3-319-50115-4_73

Wil Thomason⁷ &
Ross A. Knepper⁷

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 1))

Included in the following conference series:

International Symposium on Experimental Robotics

4272 Accesses
4 Citations

Abstract

Human communication is highly multimodal, including speech, gesture, gaze, facial expressions, and body language. Robots serving as human teammates must act on such multimodal communicative inputs from humans, even when the message may not be clear from any single modality. In this paper, we explore a method for achieving increased understanding of complex, situated communications by leveraging coordinated natural language, gesture, and context. These three problems have largely been treated separately, but unified consideration of them can yield gains in comprehension [1, 12].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://rpal.cs.cornell.edu/projects/unfamiliar-gestures.

References

Artzi, Y., Zettlemoyer, L.: UW SPF: The University of Washington Semantic Parsing Framework (2013)
Google Scholar
Chen, Q., Georganas, N.D., Petriu, E.M.: Real-time vision-based hand gesture recognition using haar-like features. In: Instrumentation and Measurement Technology Conference Proceedings, IMTC 2007, pp. 1–6. IEEE, May 2007. doi:10.1109/IMTC.2007.379068
Eldon, M., Whitney, D., Tellex, S.: Interpreting Multimodal Referring Expressions in Real Time (2015). https://edge.edx.org/assetv1:Brown+CSCI2951-K+2015_T2+type@asset+block@eldon15.pdf
Escalera, S., et al.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 365–368. ACM (2013)
Google Scholar
Escalera, S., et al.: Multi-modal gesture recognition challenge 2013: dataset and results. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 445–452. ACM (2013)
Google Scholar
Gawron, P., et al.: Eigengestures for natural human computer interface. arXiv:1105.1293 [cs] 103, pp. 49–56 (2011). doi:10.1007/978-3-642-23169-8_6, http://arxiv.org/abs/1105.1293. Accessed 29 Oct 2015
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008). ISSN: 0262-8856. doi:10.1016/j.imavis.2008.03.004, http://www.sciencedirect.com/science/article/pii/S0262885608000693. Accessed 18 Nov 2015
Guyon, I., et al.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25(8), 1929–1951 (2014). ISSN 0932-8092, 1432-1769. doi:10.1007/s00138-014-0596-3, http://link.springer.com/article/10.1007/s00138-014-0596-3. Accessed 02 Mar 2016
Huang, C.-M., Mutlu, B.: Modeling and evaluating narrative gestures for humanlike robots. In: Robotics: Science and Systems (2013)
Google Scholar
Jetley, S., et al.: Prototypical Priors: From Improving Classification to Zero- Shot Learning. arXiv:1512.01192 [cs] (3 December 2015). http://arxiv.org/abs/1512.01192. Accessed 29 Jan 2016
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)
Article Google Scholar
Kollar, T., et al.: Generalized grounding graphs: a probabilistic framework for understanding grounded language. In: JAIR (2013). https://people.csail.mit.edu/sachih/home/wp-content/uploads/2014/04/G3_JAIR.pdf
Kondo, Y.: Body gesture classification based on bag-of-features in frequency domain of motion. In: 2012 IEEE RO-MAN, pp. 386–391 (2012). doi:10.1109/ROMAN.2012.6343783
Luo, D., Ohya, J.: Study on human gesture recognition from moving camera images. In: 2010 IEEE International Conference on Multimedia and Expo (ICME), pp. 274–279, July 2010. doi:10.1109/ICME.2010.5582998
Mikolov, T., et al.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (16 January 2013). arXiv:1301.3781, http://arxiv.org/abs/1301.3781. Accessed 30 Mar 2016
Palatucci, M., et al.: Zero-shot learning with semantic output codes. In: Neural Information Processing Systems (NIPS), December 2009
Google Scholar
Sauppé, A., Mutlu, B.: Robot deictics: how gesture and context shape referential communication. In: Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI 2014, New York, NY, USA, pp. 342–349. ACM (2014). ISBN 978-1-4503-2658-2. doi:10.1145/2559636.2559657, http://doi.acm.org/10.1145/2559636.2559657. Accessed 19 Nov 2015
Segers, V., Connan, J.: Real-time gesture recognition using eigenvectors (2009). http://www.cs.uwc.ac.za/~jconnan/publications/Paper%2056%20-%20Segers.pdf
Socher, R., et al.: Zero-Shot Learning Through Cross-Modal Transfer. arXiv: 1301.3666 [cs] (16 January 2013). arXiv:1301.3666, http://arxiv.org/abs/1301.3666. Accessed 25 Jan 2016
Takano, W., Hamano, S., Nakamura, Y.: Correlated space formation for human whole-body motion primitives and descriptive word labels. Rob. Auton. Syst. 66, 35–43 (2015)
Article Google Scholar
Mahbub, U., Imtiaz, H.: One-Shot-Learning Gesture Recognition Using Motion History Based Gesture Silhouettes (2013). doi:10.12792/iciae2013
Wan, J., et al.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=2567709.2567743. Accessed 25 Jan 2016
Di, W., Zhu, F., Shao, L.: One shot learning gesture recognition from RGBD images. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 7–12. doi:10.1109/CVPRW.2012.6239179, June 2012
Wu, J.: Fusing multi-modal features for gesture recognition. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI 2013, New York, NY, USA, pp. 453–460. ACM (2013). ISBN 978-1-4503-2129-7. doi:10.1145/2522848.2532589, http://doi.acm.org/10.1145/2522848.2532589. Accessed 31 Mar 2016
Yin, Y., Davis, R.: Gesture spotting and recognition using salience detection and concatenated hidden Markov models. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI 2013, New York, NY, USA, pp. 489–494. ACM (2013). ISBN: 978-1-4503-2129-7. doi:10.1145/2522848.2532588, http://doi.acm.org/10.1145/2522848.2532588. Accessed 22 Jan 2016
Zhou, Y., et al.: Kernel-based sparse representation for gesture recognition. Pattern Recogn. 46(12), 3208–3222 (2013). ISSN 0031-3203. doi:10.1016/j.patcog.2013.06.007, http://dx.doi.org/10.1016/j.patcog.2013. Accessed 29 Jan 2016

Download references

Acknowledgements

This material is based upon research supported by the Office of Naval Research under Award Number N00014-16-1-2080. We are grateful for this support.

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, USA
Wil Thomason & Ross A. Knepper

Authors

Wil Thomason
View author publications
You can also search for this author in PubMed Google Scholar
Ross A. Knepper
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wil Thomason .

Editor information

Editors and Affiliations

Gra Sch of Info Sci&Tech,Dept of MechInf, The University of Tokyo Gra Sch of Info Sci&Tech,Dept of MechInf, Tokyo, Japan
Dana Kulić
Computer Science Department, Stanford University Computer Science Department, Stanford, California, USA
Yoshihiko Nakamura
Department of Electrical & Computer Engg, University of Waterloo Department of Electrical & Computer Engg, Waterloo, Ontario, Canada
Oussama Khatib
Department of Mechanical Systems Enginee, Tokyo University of Agriculture and Tech Department of Mechanical Systems Enginee, Tokyo, Japan
Gentiane Venture

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thomason, W., Knepper, R.A. (2017). Recognizing Unfamiliar Gestures for Human-Robot Interaction Through Zero-Shot Learning. In: Kulić, D., Nakamura, Y., Khatib, O., Venture, G. (eds) 2016 International Symposium on Experimental Robotics. ISER 2016. Springer Proceedings in Advanced Robotics, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-50115-4_73

Download citation

DOI: https://doi.org/10.1007/978-3-319-50115-4_73
Published: 21 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50114-7
Online ISBN: 978-3-319-50115-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics