Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

Singh, Aditi; Bansal, Arvind K.

doi:10.1007/978-3-031-18344-7_29

Aditi Singh¹⁰ &
Arvind K. Bansal¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 561))

Included in the following conference series:

Proceedings of the Future Technologies Conference

576 Accesses
2 Citations

Abstract

Social robotics promises to augment human caretakers in the medical industry, elderly care industry, entertainment industry, education industry and space and deep-sea explorations in the future. Automated understanding of human facial expression, speech, non-emotional conversational gestures and their integration are necessary for human-robot interaction. Conversational gestures comprise conversational head-gestures, hand-gestures, gaze, lip movements, and their synchronous integration with speech. In this paper, we implement a synchronous colored Petri net model for automated recognition of non-emotional conversational head-gestures. The scheme integrates head-motion analysis, eye-focus analysis, and their synchronization. The technique performs video analysis to derive x and y coordinates of facial feature points, stillness-vector, and silence-vector in real-time. These vectors are analyzed to derive a signature comprising meta-attribute values of the corresponding synchronous Petri net graph for each gesture. These signatures are matched against archived signatures to recognize and label the actual gestures in real-time. An algorithm using dynamic matrix-based implementation has been presented. Conversational head-gestures have been partitioned into multiple classes based upon the combination of type of head-motions, eye-focus, repeated motion, and associated speech to reduce ambiguities in gesture-labeling caused by sensor inaccuracies, sampling interval choices and various threshold limitations. A confusion matrix for a subset of gestures shows that signatures and classification on major attributes achieve a high percentage of recall in gesture recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2
Article Google Scholar
Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). https://doi.org/10.1109/HRI.2016.7451870
García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243
Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-robot-collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
Diftler, M.A., et al.: Robonaut 2 – the first humanoid robot in space. In: IEEE International Conference on Robotics and Automation, pp. 2178–2183, Shanghai, China (2011)
Google Scholar
Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York (2016)
Google Scholar
Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)
Book Google Scholar
Singh, A., Bansal, A.K.: Declarative modeling and implementation of robotic head-based gestures for human-robot interaction. Int. J. Comput. Appl. 16(2), 49–66 (2019)
Google Scholar
Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: K. Arai (ed.) Intelligent Computing. LNNS, vol. 283(1), pp. 737–756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434
Article MATH Google Scholar
Singh, A., Bansal, A.K.: Towards modeling gestures for non-emotional conversational interaction by humanoid robots. In: Proceedings of the 31st International Conference on Computer Applications in Industry and Engineering, pp. 59–64. New Orleans, LA, USA, (2018)
Google Scholar
David R., Alla, H.: Petri Nets & Grafcet, Tools for Modelling Discrete Event Systems, Prentice Hall, New York, USA (1992)
Google Scholar
Liu, H., Wang, L.: Gesture recognition for human-robot collaboration: a review. Int. J. Ind. Ergon. 68, 355–367 (2018). https://doi.org/10.1016/j.ergon.2017.02.004
Article Google Scholar
Wang, P., Li, Z., Hou, Y., Li, W.: Action recognition based on joint trajectory maps using convolutional neural networks. In: Proceedings of the 24^th International ACM Conference on Multimedia, pp. 102–106. New York (2016) https://doi.org/10.1145/2964284.2967191
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), Article 16 (2011). https://doi.org/10.1145/1922649.1922653
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017). https://doi.org/10.1016/j.patcog.2017.02.030
Article Google Scholar
Gholamrezaii, M., Almodarresi, S.M.T.: Human activity recognition using 2D convolutional neural networks. In: Proceedings of the 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1682–1686. Yazd, Iran (2019). https://doi.org/10.1109/IranianCEE.2019.8786625
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. Venice, Italy (2017). https://doi.org/10.1109/ICCV.2017.590
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013). https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Arunnehru, J., Chamundeeswari, G., Bharathi, S.P.: Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Comput. Sci. 133, 471–477 (2018). https://doi.org/10.1016/j.procs.2018.07.059
Article Google Scholar
Yang, H., Yuan, C., Li, B., Du, Y., Xing, J., Hu, W., et al.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019). https://doi.org/10.1016/j.patcog.2018.07.028
Article Google Scholar
Li, C., Zhong, Q., Xie, D., Pu, S.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 786–792. Stockholm, Sweden (2018). https://doi.org/10.24963/ijcai.2018/109
Dong, L., Jin, Y., Tao, L., Xu, G.: Recognition of multi-pose head gestures in human conversations. In: Proceedings of the Fourth International Conference on Image and Graphics (ICIG), pp. 650–654. Chengdu, China (2007). https://doi.org/10.1109/ICIG.2007.176
Thafar, M., Ghayoumi, M., Bansal, A.K.: A formal approach for multimodal integration to derive emotions. J. Vis. Lang. Sent. Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS2016030
Article Google Scholar
Ishi, C.T., Liu, C., Ishiguro, H., Hagita, N.: Head motion during dialogue speech and nod timing control in humanoid robots. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 293–300. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453183
Kapoor, A., Picard, R.W.: A real-time head nod and shake detector. In: Proceedings of the Workshop on Perceptive User Interfaces (ICMI-PUI), pp. 1–5. Orlando, FL, USA (2001). https://doi.org/10.1145/971478.971509
Tan, W., Rong, G.: A real-time head nod and shake detector using HMMs. Expert Syst. Appl. 25(3), 461–466 (2003). https://doi.org/10.1016/S0957-4174(03)00088-5
Article MathSciNet Google Scholar
Morency, L. P., Sidner, C., Lee, C., Darrell, T.: Contextual recognition of head gestures. In: Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 18–24. Trento, Italy (2005). https://doi.org/10.1145/1088463.1088470
Saunders, J., Syrdal, D.S., Koay, K.L., Burke, N., Dautenhahn, K.: Teach me–show me-end-user personalization of a smart home and companion robot. IEEE Trans. Hum.-Mach. Syst. 46(1), 27–40 (2016). https://doi.org/10.1109/THMS.2015.2445105
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY, USA (2006)
MATH Google Scholar
Murase, H., Nayar, S.K.: Visual learning and recognition of 3-D objects from appearance. Int. J. Comput. Vision 14(1), 5–24 (1995). https://doi.org/10.1007/BF01421486
Article Google Scholar
Tang, J., Nakatsu, R.: A head gesture recognition algorithm. In: International Conference of Multimedia Interfaces (ICMI), Beijing, China 2000, LNCS, vol. 1948, pp. 72–80. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-40063-X_10
Lu, P., Zhang, M., Zhu, X., Wang, Y.: Head nod and shake recognition based on multi-view model and Hidden Markov Model. In: Proceedings of the International Conference on Computer Graphics, Imaging and Visualization (CGIV), pp. 61–64. Beijing, China (2005). https://doi.org/10.1109/CGIV.2005.41
Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4617–4624. Taipei, Taiwan (2010)
Google Scholar
Otsuka, K., Tsumore, M.: Analyzing multifunctionality of head movements in face-to-face conversations using deep convolutional neural networks. IEEE Access 8, 217169–217195 (2020). https://doi.org/10.1109/ACCESS.2020.3041672
Article Google Scholar
Sharma, M., Ahmetovic, D., Jeni, L.A., Kitani, K.M., Recognizing visual signatures of spontaneous head gestures. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 400–408, Lake Tahoe, NV, USA (2018). https://doi.org/10.1109/WACV.2018.00050
McGlaun, G., Althoff, F., Lang, M., Rigoll, G.: Robust video-based recognition of dynamic head gestures in various domains - comparing a rule-based and a stochastic approach. In: Antonio, C., Volpe, G. (eds.) 5TH International Gesture Workshop On Gesture-Based Communication In Human-Computer Interaction (GW) 2003, LNAI, vol. 2915, pp. 180–197. Springer-Verlag, Berlin Heidelberg (2004)
Google Scholar
Lavee, G., Borzin, A., Rivlin, E., Rudzsky, M.: Building petri nets from video event ontologies. In: Bebis, G., Tanveer S.-M., et al. (eds.) International Conference on Advances in Visual Computing (ISVC) 2007. LNCS, vol. 4841, pp. 442–445. Springer-Verlag, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76858-6_44
Ghanem, N., DeMenthon, D., Doermann, D., Davis, L.: Representation and recognition of events in surveillance video using Petri nets. In: Proceedings of the Second IEEE Workshop on Event Mining, Computer Vision and Pattern Recognition, International Conference on Computer Vision and Pattern Recognition, p. 112 (2004). https://doi.org/10.1109/CVPR.2004.430
Mancas, M., Glowinski, D., Volpe, G., Coletta, P., Camurri, A.: Gesture saliency: a context-aware analysis. In: Kopp, S., Wachsmuth, I. (eds.) GW 2009. LNCS (LNAI), vol. 5934, pp. 146–157. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12553-9_13
Chapter Google Scholar
Qiu, J., Wang, L., Wang, Y., Hu, Y.H.: Multi-event modeling and recognition using extended petri nets. IEEE Access 8, 37879–37890 (2020). https://doi.org/10.1109/ACCESS.2020.2975095
Article Google Scholar
Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3
Article Google Scholar
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
Open CV. https://opencv.org. Accessed 29 Apr 2022
PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 29 Apr 2022
Pydub. https://pypi.org/project/pydub/. Accessed 29 Apr 2022
Ellis, W.D., (ed.): A Source Book of Gestalt Psychology. Kegan Paul, Trench, Trubner & Company, (1938). https://doi.org/10.1037/11496-000
McClave, E.Z.: Linguistic functions of head movements in the context of speech. J. Pragmat. 32(7), 855–878 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kent State University, Kent, OH, 44242, USA
Aditi Singh & Arvind K. Bansal

Authors

Aditi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Arvind K. Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Bansal, A.K. (2023). Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-18344-7_29
Published: 14 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18343-0
Online ISBN: 978-3-031-18344-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics