Abstract
Capturing visual human-centered information is a fundamental input source for effective and successful human-robot interaction (HRI) in dynamic multi-party social settings. Torso and head pose, as forms of nonverbal communication, support the derivation people’s focus of attention, a key variable in the analysis of human behaviour in HRI paradigms encompassing social aspects. Towards this goal, we have developed a model-based approach for torso and head pose estimation to overcome key limitations in free-form interaction scenarios and issues of partial intra- and inter-person occlusions. The proposed approach builds up on the concept of Top View Re-projection (TVR) to uniformly treat the respective body parts, modelled as cylinders. For each body part a number of pose hypotheses is sampled from its configuration space. Each pose hypothesis is evaluated against the a scoring function and the hypothesis with the best score yields for the assumed pose and the location of the joints. A refinement step on head pose is applied based on tracking facial patch deformations to compute for the horizontal off-plane rotation. The overall approach forms one of the core component of a vision system integrated in a robotic platform that supports socially appropriate, multi-party, multimodal interaction in a bartending scenario. Results in the robot’s environment during real HRI experiments with varying number of users attest for the effectiveness of our approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baltzakis, H., Trahanias, P.: Hybrid mobile robot localization using switching state-space models. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 366–373 (2002)
Tsonis, V.S., Chandrinos, K.V., Trahanias, P.E.: Landmark-based navigation using projective invariants. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 342–347 (1998)
Baltzakis, H., Argyros, A.A., Lourakis, M.I.A., Trahanias, P.: Tracking of human hands and faces through probabilistic fusion of multiple visual cues. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 33–42. Springer, Heidelberg (2008)
Sigalas, M., Baltzakis, H., Trahanias, P.: Gesture recognition based on arm tracking for human-robot interaction. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5424–5429 (2010)
Langton, S.R., Honeyman, H., Tessler, E.: The influence of head contour and nose angle on the perception of eye-gaze direction. Percept. Psychophysics 66(5), 752–771 (2004)
Moeslund, T.B., Hilton, A., Kruger, V., Sigal, L. (eds.): Visual Analysis of Humans - Looking at People. Springer, London (2011)
Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31, 607–626 (2009)
Microsoft kinect for xbox 360
Escalera, S.: Human behavior analysis from depth maps. In: Perales, F.J., Fisher, R.B., Moeslund, T.B. (eds.) AMDO 2012. LNCS, vol. 7378, pp. 282–292. Springer, Heidelberg (2012)
Shotton, J., et al.: Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2821–2840 (2013)
Fanelli, G., Gall, J., Gool, L.V.: Real time head pose estimation with random regression forests. In: Proceedings on Computer Vision and Pattern Recognition (CVPR), pp. 617–624 (2011)
Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3d deformable face tracking with a commodity depth camera. In: Proceedings of the 11th European Conference on Computer Vision: Part III. ECCV 2010, pp. 229–242. Springer-Verlag, Heidelberg (2010)
Zhu, Y., Fujimura, K.: Constrained optimization for human pose estimation from depth sequences. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 408–418. Springer, Heidelberg (2007)
Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3d pose estimation from a single depth image. In: IEEE International Conference on Computer Vision (ICCV), pp. 731–738 (2011)
Yang, R., Zhang, Z.: Model-based head pose tracking with stereovision. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, pp. 255–260 (2001)
Sigalas, M., Pateraki, M., Trahanias, P.: Robust articulated upper body pose tracking under severe occlusions. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4104–4111 (2014)
NASA: Man-systems integration standards - revision b (1995)
Stenger, B., Thayananthan, A., Torr, P.H., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1372–1384 (2006)
Pateraki, M., Baltzakis, H., Trahanias, P.: Visual estimation of pointed targets for robot guidance via fusion of face pose and hand orientation. Comput. Vision Image Underst. 120, 1–13 (2014)
Pateraki, M., Baltzakis, H., Trahanias, P.: Using dempster’s rule of combination to robustly estimate pointed targets. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1218–1225 (2012)
Giuliani, M., et al.: Comparing task-based and socially intelligent behaviour in a robot bartender. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction (ICMI), pp. 263–270, New York (2013)
Foster, M., et al.: Two people walk into a bar: dynamic multi-party social interaction with a robot agent. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI), pp. 3–10 (2012)
Acknowledgments
This work was partially supported by the European Commission under contract number FP7-270435 (JAMES project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sigalas, M., Pateraki, M., Trahanias, P. (2015). Visual Estimation of Attentive Cues in HRI: The Case of Torso and Head Pose. In: Nalpantidis, L., Krüger, V., Eklundh, JO., Gasteratos, A. (eds) Computer Vision Systems. ICVS 2015. Lecture Notes in Computer Science(), vol 9163. Springer, Cham. https://doi.org/10.1007/978-3-319-20904-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-20904-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20903-6
Online ISBN: 978-3-319-20904-3
eBook Packages: Computer ScienceComputer Science (R0)