Abstract
Video emotion recognition has recently become a research hotspot in the field of affective computing. Although large parts of studies focus on facial cues, body gestures are the only available cues in some scenes such as video monitoring systems. In this paper, we propose a body gesture representation method based on body joint movements. To reduce the model complexity and promote the understanding of video emotion, this method uses body joint information to represent body gestures and captures time-dependent relationship of body joints. Furthermore, we propose an attention-based channelwise convolutional neural network (ACCNN) to retain the independent characteristics of each body joint and learn key body gesture features. Experimental results on the multimodal database of Emotional Speech, Video and Gestures (ESVG) demonstrate the effectiveness of the proposed method, and the accuracy of body gesture features is comparable with that of facial features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012). https://doi.org/10.1126/science.1224313
Barros, P., Jirak, D., Weber, C., Wermter, S.: Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Netw. 72, 140–151 (2015). https://doi.org/10.1016/j.neunet.2015.09.009
Barros, P., Parisi, G., Weber, C., Wermter, S.: Emotion-modulated attention improves expression recognition: a deep learning model. Neurocomputing 253, 104–114 (2017). https://doi.org/10.1016/j.neucom.2017.01.096
Beatrice, D.G.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 364(1535), 3475–3484 (2009). https://doi.org/10.1098/rstb.2009.0190
Camurri, A., Lagerlöf, I., Volpe, G.: Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. Int. J. Hum. Comput. Stud. 59(1–2), 213–225 (2003). https://doi.org/10.1016/S1071-5819(03)00050-8
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1310, July 2017. https://doi.org/10.1109/CVPR.2017.143
Deng, J.J., Leung, C.H.C., Mengoni, P., Li, Y.: Emotion recognition from human behaviors using attention model. In: 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 249–253, September 2018. https://doi.org/10.1109/AIKE.2018.00056
Ekman, P.: Mistakes when deceiving. Ann. N. Y. Acad. Sci. 364(1), 269–278 (1981). https://doi.org/10.1111/j.1749-6632.1981.tb34479.x
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robot. Autom. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434
Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR 2006), pp. 1148–1153, August 2006. https://doi.org/10.1109/ICPR.2006.39
Gunes, H., Piccardi, M.: Fusing face and body gesture for machine recognition of emotions. In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005, pp. 306–311, October 2005. https://doi.org/10.1109/ROMAN.2005.1513796
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018. https://doi.org/10.1109/CVPR.2018.00745
Izard, C.E., Ackerman, B.P., Schoff, K.M., Fine, S.E.: Self-Organization of Discrete Emotions, Emotion Patterns, and Emotion-Cognition Relations, pp. 15–36. Cambridge Studies in Social and Emotional Development, Cambridge University Press (2000). https://doi.org/10.1017/CBO9780511527883.003
Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10855–10864, June 2019. https://doi.org/10.1109/CVPR.2019.01112
Matsumoto, D., Frank, M., Hwang, H.: Nonverbal Communication: Science and Applications. Sage Publications (2012). https://doi.org/10.4135/9781452244037
Nass, C., Jonsson, I.M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI ’05 Extended Abstracts on Human Factors in Computing Systems, CHI EA 2005, New York, NY, USA, pp. 1973–1976. Association for Computing Machinery (2005). https://doi.org/10.1145/1056808.1057070
Pease, B., Pease, A.: The Definitive Book of Body Language: The Hidden Meaning Behind People’s Gestures and Expressions. Bantam (2008)
Piana, S., Staglianò, A., Odone, F., Camurri, A.: Adaptive body gesture representation for automatic emotion recognition. ACM Trans. Interact. Intell. Syst. 6(1), 1–31 (2016). https://doi.org/10.1145/2818740
Psaltis, A., Kaza, K., Stefanidis, K., Thermos, S., Apostolakis, K.C.: Multimodal affective state recognition in serious games applications. In: IEEE International Conference on Imaging Systems and Techniques, pp. 435–439, October 2016. https://doi.org/10.1109/IST.2016.7738265
Saha, S., Datta, S., Konar, A., Janarthanan, R.: A study on emotion recognition from body gestures using kinect sensor. In: 2014 International Conference on Communication and Signal Processing, pp. 056–060, April 2014. https://doi.org/10.1109/ICCSP.2014.6949798
Sapiński, T., Kamińska, D., Pelikant, A., Ozcinar, C., Avots, E., Anbarjafari, G.: Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp. 153–163, August 2018
Siegman, A.W., Feldstein, S.: Nonverbal Behavior and Communication. Psychology Press (2014)
Sun, B., Cao, S., He, J., Yu, L.: Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw. 105, 36–51 (2017). https://doi.org/10.1016/j.neunet.2017.11.021
Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3D action and gesture recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 142–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, J., Yang, X., Dong, Y. (2021). Time-Dependent Body Gesture Representation for Video Emotion Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-67832-6_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)