Abstract
Social robotics promises to augment human caretakers in the medical industry, elderly care industry, entertainment industry, education industry and space and deep-sea explorations in the future. Automated understanding of human facial expression, speech, non-emotional conversational gestures and their integration are necessary for human-robot interaction. Conversational gestures comprise conversational head-gestures, hand-gestures, gaze, lip movements, and their synchronous integration with speech. In this paper, we implement a synchronous colored Petri net model for automated recognition of non-emotional conversational head-gestures. The scheme integrates head-motion analysis, eye-focus analysis, and their synchronization. The technique performs video analysis to derive x and y coordinates of facial feature points, stillness-vector, and silence-vector in real-time. These vectors are analyzed to derive a signature comprising meta-attribute values of the corresponding synchronous Petri net graph for each gesture. These signatures are matched against archived signatures to recognize and label the actual gestures in real-time. An algorithm using dynamic matrix-based implementation has been presented. Conversational head-gestures have been partitioned into multiple classes based upon the combination of type of head-motions, eye-focus, repeated motion, and associated speech to reduce ambiguities in gesture-labeling caused by sensor inaccuracies, sampling interval choices and various threshold limitations. A confusion matrix for a subset of gestures shows that signatures and classification on major attributes achieve a high percentage of recall in gesture recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2
Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). https://doi.org/10.1109/HRI.2016.7451870
GarcÃa, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243
Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-robot-collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
Diftler, M.A., et al.: Robonaut 2 – the first humanoid robot in space. In: IEEE International Conference on Robotics and Automation, pp. 2178–2183, Shanghai, China (2011)
Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York (2016)
Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)
Singh, A., Bansal, A.K.: Declarative modeling and implementation of robotic head-based gestures for human-robot interaction. Int. J. Comput. Appl. 16(2), 49–66 (2019)
Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: K. Arai (ed.) Intelligent Computing. LNNS, vol. 283(1), pp. 737–756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434
Singh, A., Bansal, A.K.: Towards modeling gestures for non-emotional conversational interaction by humanoid robots. In: Proceedings of the 31st International Conference on Computer Applications in Industry and Engineering, pp. 59–64. New Orleans, LA, USA, (2018)
David R., Alla, H.: Petri Nets & Grafcet, Tools for Modelling Discrete Event Systems, Prentice Hall, New York, USA (1992)
Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind. Ergon. 68, 355–367 (2018). https://doi.org/10.1016/j.ergon.2017.02.004
Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: Proceedings of the 24th International ACM Conference on Multimedia, pp. 102–106. New York (2016) https://doi.org/10.1145/2964284.2967191
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), Article 16 (2011). https://doi.org/10.1145/1922649.1922653
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017). https://doi.org/10.1016/j.patcog.2017.02.030
Gholamrezaii, M., Almodarresi, S.M.T.: Human activity recognition using 2D convolutional neural networks. In: Proceedings of the 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1682–1686. Yazd, Iran (2019). https://doi.org/10.1109/IranianCEE.2019.8786625
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.590
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Arunnehru, J., Chamundeeswari, G., Bharathi, S.P.: Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput. Sci. 133, 471–477 (2018). https://doi.org/10.1016/j.procs.2018.07.059
Yang, H., Yuan, C., Li, B., Du, Y., Xing, J., Hu, W., et al.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019). https://doi.org/10.1016/j.patcog.2018.07.028
Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 786–792. Stockholm, Sweden (2018). https://doi.org/10.24963/ijcai.2018/109
Dong, L., Jin, Y., Tao, L., Xu, G.: Recognition of multi-pose head gestures in human conversations. In: Proceedings of the Fourth International Conference on Image and Graphics (ICIG), pp. 650–654. Chengdu, China (2007). https://doi.org/10.1109/ICIG.2007.176
Thafar, M., Ghayoumi, M., Bansal, A.K.: A formal approach for multimodal integration to derive emotions. J. Vis. Lang. Sent. Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS2016030
Ishi, C.T., Liu, C., Ishiguro, H., Hagita, N.: Head motion during dialogue speech and nod timing control in humanoid robots. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 293–300. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453183
Kapoor, A., Picard, R.W.: A real-time head nod and shake detector. In: Proceedings of the Workshop on Perceptive User Interfaces (ICMI-PUI), pp. 1–5. Orlando, FL, USA (2001). https://doi.org/10.1145/971478.971509
Tan, W., Rong, G.: A real-time head nod and shake detector using HMMs. Expert Syst. Appl. 25(3), 461–466 (2003). https://doi.org/10.1016/S0957-4174(03)00088-5
Morency, L. P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 18–24. Trento, Italy (2005). https://doi.org/10.1145/1088463.1088470
Saunders, J., Syrdal, D.S., Koay, K.L., Burke, N., Dautenhahn, K.: Teach me–show me-end-user personalization of a smart home and companion robot. IEEE Trans. Hum.-Mach. Syst. 46(1), 27–40 (2016). https://doi.org/10.1109/THMS.2015.2445105
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY, USA (2006)
Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance. Int. J. Comput. Vision 14(1), 5–24 (1995). https://doi.org/10.1007/BF01421486
Tang, J., Nakatsu, R.: A head gesture recognition algorithm. In: International Conference of Multimedia Interfaces (ICMI), Beijing, China 2000, LNCS, vol. 1948, pp. 72–80. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_10
Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-view model and Hidden Markov Model. In: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV), pp. 61–64. Beijing, China (2005). https://doi.org/10.1109/CGIV.2005.41
Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4617–4624. Taipei, Taiwan (2010)
Otsuka, K., Tsumore, M.: Analyzing multifunctionality of head movements in face-to-face conversations using deep convolutional neural networks. IEEE Access 8, 217169–217195 (2020). https://doi.org/10.1109/ACCESS.2020.3041672
Sharma, M., Ahmetovic, D., Jeni, L.A., Kitani, K.M., Recognizing visual signatures of spontaneous head gestures. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 400–408, Lake Tahoe, NV, USA (2018). https://doi.org/10.1109/WACV.2018.00050
McGlaun, G., Althoff, F., Lang, M., Rigoll, G.: Robust video-based recognition of dynamic head gestures in various domains - comparing a rule-based and a stochastic approach. In: Antonio, C., Volpe, G. (eds.) 5TH International Gesture Workshop On Gesture-Based Communication In Human-Computer Interaction (GW) 2003, LNAI, vol. 2915, pp. 180–197. Springer-Verlag, Berlin Heidelberg (2004)
Lavee, G., Borzin, A., Rivlin, E., Rudzsky, M.: Building petri nets from video event ontologies. In: Bebis, G., Tanveer S.-M., et al. (eds.) International Conference on Advances in Visual Computing (ISVC) 2007. LNCS, vol. 4841, pp. 442–445. Springer-Verlag, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76858-6_44
Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of events in surveillance video using Petri nets. In: Proceedings of the Second IEEE Workshop on Event Mining, Computer Vision and Pattern Recognition, International Conference on Computer Vision and Pattern Recognition, p. 112 (2004). https://doi.org/10.1109/CVPR.2004.430
Mancas, M., Glowinski, D., Volpe, G., Coletta, P., Camurri, A.: Gesture saliency: a context-aware analysis. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS (LNAI), vol. 5934, pp. 146–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12553-9_13
Qiu, J., Wang, L., Wang, Y., Hu, Y.H.: Multi-event modeling and recognition using extended petri nets. IEEE Access 8, 37879–37890 (2020). https://doi.org/10.1109/ACCESS.2020.2975095
Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
Open CV. https://opencv.org. Accessed 29 Apr 2022
PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 29 Apr 2022
Pydub. https://pypi.org/project/pydub/. Accessed 29 Apr 2022
Ellis, W.D., (ed.): A Source Book of Gestalt Psychology. Kegan Paul, Trench, Trubner & Company, (1938). https://doi.org/10.1037/11496-000
McClave, E.Z.: Linguistic functions of head movements in the context of speech. J. Pragmat. 32(7), 855–878 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, A., Bansal, A.K. (2023). Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-18344-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18343-0
Online ISBN: 978-3-031-18344-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)