Time-Dependent Body Gesture Representation for Video Emotion Recognition

Wei, Jie; Yang, Xinyu; Dong, Yizhuo

doi:10.1007/978-3-030-67832-6_33

Jie Wei¹⁵,
Xinyu Yang¹⁵ &
Yizhuo Dong¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

2628 Accesses
3 Citations

Abstract

Video emotion recognition has recently become a research hotspot in the field of affective computing. Although large parts of studies focus on facial cues, body gestures are the only available cues in some scenes such as video monitoring systems. In this paper, we propose a body gesture representation method based on body joint movements. To reduce the model complexity and promote the understanding of video emotion, this method uses body joint information to represent body gestures and captures time-dependent relationship of body joints. Furthermore, we propose an attention-based channelwise convolutional neural network (ACCNN) to retain the independent characteristics of each body joint and learn key body gesture features. Experimental results on the multimodal database of Emotional Speech, Video and Gestures (ESVG) demonstrate the effectiveness of the proposed method, and the accuracy of body gesture features is comparable with that of facial features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012). https://doi.org/10.1126/science.1224313
Article Google Scholar
Barros, P., Jirak, D., Weber, C., Wermter, S.: Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Netw. 72, 140–151 (2015). https://doi.org/10.1016/j.neunet.2015.09.009
Article Google Scholar
Barros, P., Parisi, G., Weber, C., Wermter, S.: Emotion-modulated attention improves expression recognition: a deep learning model. Neurocomputing 253, 104–114 (2017). https://doi.org/10.1016/j.neucom.2017.01.096
Article Google Scholar
Beatrice, D.G.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 364(1535), 3475–3484 (2009). https://doi.org/10.1098/rstb.2009.0190
Camurri, A., Lagerlöf, I., Volpe, G.: Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. Int. J. Hum. Comput. Stud. 59(1–2), 213–225 (2003). https://doi.org/10.1016/S1071-5819(03)00050-8
Article Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1310, July 2017. https://doi.org/10.1109/CVPR.2017.143
Deng, J.J., Leung, C.H.C., Mengoni, P., Li, Y.: Emotion recognition from human behaviors using attention model. In: 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 249–253, September 2018. https://doi.org/10.1109/AIKE.2018.00056
Ekman, P.: Mistakes when deceiving. Ann. N. Y. Acad. Sci. 364(1), 269–278 (1981). https://doi.org/10.1111/j.1749-6632.1981.tb34479.x
Article Google Scholar
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robot. Autom. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434
Article Google Scholar
Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR 2006), pp. 1148–1153, August 2006. https://doi.org/10.1109/ICPR.2006.39
Gunes, H., Piccardi, M.: Fusing face and body gesture for machine recognition of emotions. In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005, pp. 306–311, October 2005. https://doi.org/10.1109/ROMAN.2005.1513796
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018. https://doi.org/10.1109/CVPR.2018.00745
Izard, C.E., Ackerman, B.P., Schoff, K.M., Fine, S.E.: Self-Organization of Discrete Emotions, Emotion Patterns, and Emotion-Cognition Relations, pp. 15–36. Cambridge Studies in Social and Emotional Development, Cambridge University Press (2000). https://doi.org/10.1017/CBO9780511527883.003
Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10855–10864, June 2019. https://doi.org/10.1109/CVPR.2019.01112
Matsumoto, D., Frank, M., Hwang, H.: Nonverbal Communication: Science and Applications. Sage Publications (2012). https://doi.org/10.4135/9781452244037
Nass, C., Jonsson, I.M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI ’05 Extended Abstracts on Human Factors in Computing Systems, CHI EA 2005, New York, NY, USA, pp. 1973–1976. Association for Computing Machinery (2005). https://doi.org/10.1145/1056808.1057070
Pease, B., Pease, A.: The Definitive Book of Body Language: The Hidden Meaning Behind People’s Gestures and Expressions. Bantam (2008)
Google Scholar
Piana, S., Staglianò, A., Odone, F., Camurri, A.: Adaptive body gesture representation for automatic emotion recognition. ACM Trans. Interact. Intell. Syst. 6(1), 1–31 (2016). https://doi.org/10.1145/2818740
Article Google Scholar
Psaltis, A., Kaza, K., Stefanidis, K., Thermos, S., Apostolakis, K.C.: Multimodal affective state recognition in serious games applications. In: IEEE International Conference on Imaging Systems and Techniques, pp. 435–439, October 2016. https://doi.org/10.1109/IST.2016.7738265
Saha, S., Datta, S., Konar, A., Janarthanan, R.: A study on emotion recognition from body gestures using kinect sensor. In: 2014 International Conference on Communication and Signal Processing, pp. 056–060, April 2014. https://doi.org/10.1109/ICCSP.2014.6949798
Sapiński, T., Kamińska, D., Pelikant, A., Ozcinar, C., Avots, E., Anbarjafari, G.: Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp. 153–163, August 2018
Google Scholar
Siegman, A.W., Feldstein, S.: Nonverbal Behavior and Communication. Psychology Press (2014)
Google Scholar
Sun, B., Cao, S., He, J., Yu, L.: Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw. 105, 36–51 (2017). https://doi.org/10.1016/j.neunet.2017.11.021
Article Google Scholar
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3D action and gesture recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 142–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_9
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, China
Jie Wei, Xinyu Yang & Yizhuo Dong

Authors

Jie Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhuo Dong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinyu Yang .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, J., Yang, X., Dong, Y. (2021). Time-Dependent Body Gesture Representation for Video Emotion Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_33
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics