Skip to main content

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

  • Conference paper
  • First Online:
Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3 (FTC 2022 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 561))

Included in the following conference series:

Abstract

Social robotics promises to augment human caretakers in the medical industry, elderly care industry, entertainment industry, education industry and space and deep-sea explorations in the future. Automated understanding of human facial expression, speech, non-emotional conversational gestures and their integration are necessary for human-robot interaction. Conversational gestures comprise conversational head-gestures, hand-gestures, gaze, lip movements, and their synchronous integration with speech. In this paper, we implement a synchronous colored Petri net model for automated recognition of non-emotional conversational head-gestures. The scheme integrates head-motion analysis, eye-focus analysis, and their synchronization. The technique performs video analysis to derive x and y coordinates of facial feature points, stillness-vector, and silence-vector in real-time. These vectors are analyzed to derive a signature comprising meta-attribute values of the corresponding synchronous Petri net graph for each gesture. These signatures are matched against archived signatures to recognize and label the actual gestures in real-time. An algorithm using dynamic matrix-based implementation has been presented. Conversational head-gestures have been partitioned into multiple classes based upon the combination of type of head-motions, eye-focus, repeated motion, and associated speech to reduce ambiguities in gesture-labeling caused by sensor inaccuracies, sampling interval choices and various threshold limitations. A confusion matrix for a subset of gestures shows that signatures and classification on major attributes achieve a high percentage of recall in gesture recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2

    Article  Google Scholar 

  2. Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). https://doi.org/10.1109/HRI.2016.7451870

  3. García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243

  4. Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-robot-collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103

  5. Diftler, M.A., et al.: Robonaut 2 – the first humanoid robot in space. In: IEEE International Conference on Robotics and Automation, pp. 2178–2183, Shanghai, China (2011)

    Google Scholar 

  6. Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York (2016)

    Google Scholar 

  7. Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)

    Book  Google Scholar 

  8. Singh, A., Bansal, A.K.: Declarative modeling and implementation of robotic head-based gestures for human-robot interaction. Int. J. Comput. Appl. 16(2), 49–66 (2019)

    Google Scholar 

  9. Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: K. Arai (ed.) Intelligent Computing. LNNS, vol. 283(1), pp. 737–756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9

  10. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434

    Article  MATH  Google Scholar 

  11. Singh, A., Bansal, A.K.: Towards modeling gestures for non-emotional conversational interaction by humanoid robots. In: Proceedings of the 31st International Conference on Computer Applications in Industry and Engineering, pp. 59–64. New Orleans, LA, USA, (2018)

    Google Scholar 

  12. David R., Alla, H.: Petri Nets & Grafcet, Tools for Modelling Discrete Event Systems, Prentice Hall, New York, USA (1992)

    Google Scholar 

  13. Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind. Ergon. 68, 355–367 (2018). https://doi.org/10.1016/j.ergon.2017.02.004

    Article  Google Scholar 

  14. Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: Proceedings of the 24th International ACM Conference on Multimedia, pp. 102–106. New York (2016) https://doi.org/10.1145/2964284.2967191

  15. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), Article 16 (2011). https://doi.org/10.1145/1922649.1922653

  16. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017). https://doi.org/10.1016/j.patcog.2017.02.030

    Article  Google Scholar 

  17. Gholamrezaii, M., Almodarresi, S.M.T.: Human activity recognition using 2D convolutional neural networks. In: Proceedings of the 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1682–1686. Yazd, Iran (2019). https://doi.org/10.1109/IranianCEE.2019.8786625

  18. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.590

  19. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  20. Arunnehru, J., Chamundeeswari, G., Bharathi, S.P.: Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput. Sci. 133, 471–477 (2018). https://doi.org/10.1016/j.procs.2018.07.059

    Article  Google Scholar 

  21. Yang, H., Yuan, C., Li, B., Du, Y., Xing, J., Hu, W., et al.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019). https://doi.org/10.1016/j.patcog.2018.07.028

    Article  Google Scholar 

  22. Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 786–792. Stockholm, Sweden (2018). https://doi.org/10.24963/ijcai.2018/109

  23. Dong, L., Jin, Y., Tao, L., Xu, G.: Recognition of multi-pose head gestures in human conversations. In: Proceedings of the Fourth International Conference on Image and Graphics (ICIG), pp. 650–654. Chengdu, China (2007). https://doi.org/10.1109/ICIG.2007.176

  24. Thafar, M., Ghayoumi, M., Bansal, A.K.: A formal approach for multimodal integration to derive emotions. J. Vis. Lang. Sent. Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS2016030

    Article  Google Scholar 

  25. Ishi, C.T., Liu, C., Ishiguro, H., Hagita, N.: Head motion during dialogue speech and nod timing control in humanoid robots. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 293–300. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453183

  26. Kapoor, A., Picard, R.W.: A real-time head nod and shake detector. In: Proceedings of the Workshop on Perceptive User Interfaces (ICMI-PUI), pp. 1–5. Orlando, FL, USA (2001). https://doi.org/10.1145/971478.971509

  27. Tan, W., Rong, G.: A real-time head nod and shake detector using HMMs. Expert Syst. Appl. 25(3), 461–466 (2003). https://doi.org/10.1016/S0957-4174(03)00088-5

    Article  MathSciNet  Google Scholar 

  28. Morency, L. P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 18–24. Trento, Italy (2005). https://doi.org/10.1145/1088463.1088470

  29. Saunders, J., Syrdal, D.S., Koay, K.L., Burke, N., Dautenhahn, K.: Teach me–show me-end-user personalization of a smart home and companion robot. IEEE Trans. Hum.-Mach. Syst. 46(1), 27–40 (2016). https://doi.org/10.1109/THMS.2015.2445105

  30. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY, USA (2006)

    MATH  Google Scholar 

  31. Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance. Int. J. Comput. Vision 14(1), 5–24 (1995). https://doi.org/10.1007/BF01421486

    Article  Google Scholar 

  32. Tang, J., Nakatsu, R.: A head gesture recognition algorithm. In: International Conference of Multimedia Interfaces (ICMI), Beijing, China 2000, LNCS, vol. 1948, pp. 72–80. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_10

  33. Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-view model and Hidden Markov Model. In: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV), pp. 61–64. Beijing, China (2005). https://doi.org/10.1109/CGIV.2005.41

  34. Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4617–4624. Taipei, Taiwan (2010)

    Google Scholar 

  35. Otsuka, K., Tsumore, M.: Analyzing multifunctionality of head movements in face-to-face conversations using deep convolutional neural networks. IEEE Access 8, 217169–217195 (2020). https://doi.org/10.1109/ACCESS.2020.3041672

    Article  Google Scholar 

  36. Sharma, M., Ahmetovic, D., Jeni, L.A., Kitani, K.M., Recognizing visual signatures of spontaneous head gestures. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 400–408, Lake Tahoe, NV, USA (2018). https://doi.org/10.1109/WACV.2018.00050

  37. McGlaun, G., Althoff, F., Lang, M., Rigoll, G.: Robust video-based recognition of dynamic head gestures in various domains - comparing a rule-based and a stochastic approach. In: Antonio, C., Volpe, G. (eds.) 5TH International Gesture Workshop On Gesture-Based Communication In Human-Computer Interaction (GW) 2003, LNAI, vol. 2915, pp. 180–197. Springer-Verlag, Berlin Heidelberg (2004)

    Google Scholar 

  38. Lavee, G., Borzin, A., Rivlin, E., Rudzsky, M.: Building petri nets from video event ontologies. In: Bebis, G., Tanveer S.-M., et al. (eds.) International Conference on Advances in Visual Computing (ISVC) 2007. LNCS, vol. 4841, pp. 442–445. Springer-Verlag, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76858-6_44

  39. Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of events in surveillance video using Petri nets. In: Proceedings of the Second IEEE Workshop on Event Mining, Computer Vision and Pattern Recognition, International Conference on Computer Vision and Pattern Recognition, p. 112 (2004). https://doi.org/10.1109/CVPR.2004.430

  40. Mancas, M., Glowinski, D., Volpe, G., Coletta, P., Camurri, A.: Gesture saliency: a context-aware analysis. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS (LNAI), vol. 5934, pp. 146–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12553-9_13

    Chapter  Google Scholar 

  41. Qiu, J., Wang, L., Wang, Y., Hu, Y.H.: Multi-event modeling and recognition using extended petri nets. IEEE Access 8, 37879–37890 (2020). https://doi.org/10.1109/ACCESS.2020.2975095

    Article  Google Scholar 

  42. Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3

    Article  Google Scholar 

  43. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028

  44. Open CV. https://opencv.org. Accessed 29 Apr 2022

  45. PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 29 Apr 2022

  46. Pydub. https://pypi.org/project/pydub/. Accessed 29 Apr 2022

  47. Ellis, W.D., (ed.): A Source Book of Gestalt Psychology. Kegan Paul, Trench, Trubner & Company, (1938). https://doi.org/10.1037/11496-000

  48. McClave, E.Z.: Linguistic functions of head movements in the context of speech. J. Pragmat. 32(7), 855–878 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Bansal, A.K. (2023). Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_29

Download citation

Publish with us

Policies and ethics