Skip to main content

Time-Dependent Body Gesture Representation for Video Emotion Recognition

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

Abstract

Video emotion recognition has recently become a research hotspot in the field of affective computing. Although large parts of studies focus on facial cues, body gestures are the only available cues in some scenes such as video monitoring systems. In this paper, we propose a body gesture representation method based on body joint movements. To reduce the model complexity and promote the understanding of video emotion, this method uses body joint information to represent body gestures and captures time-dependent relationship of body joints. Furthermore, we propose an attention-based channelwise convolutional neural network (ACCNN) to retain the independent characteristics of each body joint and learn key body gesture features. Experimental results on the multimodal database of Emotional Speech, Video and Gestures (ESVG) demonstrate the effectiveness of the proposed method, and the accuracy of body gesture features is comparable with that of facial features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aviezer, H., Trope, Y., Todorov, A.: Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338(6111), 1225–1229 (2012). https://doi.org/10.1126/science.1224313

    Article  Google Scholar 

  2. Barros, P., Jirak, D., Weber, C., Wermter, S.: Multimodal emotional state recognition using sequence-dependent deep hierarchical features. Neural Netw. 72, 140–151 (2015). https://doi.org/10.1016/j.neunet.2015.09.009

    Article  Google Scholar 

  3. Barros, P., Parisi, G., Weber, C., Wermter, S.: Emotion-modulated attention improves expression recognition: a deep learning model. Neurocomputing 253, 104–114 (2017). https://doi.org/10.1016/j.neucom.2017.01.096

    Article  Google Scholar 

  4. Beatrice, D.G.: Why bodies? Twelve reasons for including bodily expressions in affective neuroscience. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 364(1535), 3475–3484 (2009). https://doi.org/10.1098/rstb.2009.0190

  5. Camurri, A., Lagerlöf, I., Volpe, G.: Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. Int. J. Hum. Comput. Stud. 59(1–2), 213–225 (2003). https://doi.org/10.1016/S1071-5819(03)00050-8

    Article  Google Scholar 

  6. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1310, July 2017. https://doi.org/10.1109/CVPR.2017.143

  7. Deng, J.J., Leung, C.H.C., Mengoni, P., Li, Y.: Emotion recognition from human behaviors using attention model. In: 2018 IEEE First International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), pp. 249–253, September 2018. https://doi.org/10.1109/AIKE.2018.00056

  8. Ekman, P.: Mistakes when deceiving. Ann. N. Y. Acad. Sci. 364(1), 269–278 (1981). https://doi.org/10.1111/j.1749-6632.1981.tb34479.x

    Article  Google Scholar 

  9. Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. IEEE Robot. Autom. Lett. 4(4), 4011–4018 (2019). https://doi.org/10.1109/LRA.2019.2930434

    Article  Google Scholar 

  10. Gunes, H., Piccardi, M.: A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. In: 18th International Conference on Pattern Recognition (ICPR 2006), pp. 1148–1153, August 2006. https://doi.org/10.1109/ICPR.2006.39

  11. Gunes, H., Piccardi, M.: Fusing face and body gesture for machine recognition of emotions. In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005, pp. 306–311, October 2005. https://doi.org/10.1109/ROMAN.2005.1513796

  12. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, June 2018. https://doi.org/10.1109/CVPR.2018.00745

  13. Izard, C.E., Ackerman, B.P., Schoff, K.M., Fine, S.E.: Self-Organization of Discrete Emotions, Emotion Patterns, and Emotion-Cognition Relations, pp. 15–36. Cambridge Studies in Social and Emotional Development, Cambridge University Press (2000). https://doi.org/10.1017/CBO9780511527883.003

  14. Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: CrowdPose: efficient crowded scenes pose estimation and a new benchmark. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10855–10864, June 2019. https://doi.org/10.1109/CVPR.2019.01112

  15. Matsumoto, D., Frank, M., Hwang, H.: Nonverbal Communication: Science and Applications. Sage Publications (2012). https://doi.org/10.4135/9781452244037

  16. Nass, C., Jonsson, I.M., Harris, H., Reaves, B., Endo, J., Brave, S., Takayama, L.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI ’05 Extended Abstracts on Human Factors in Computing Systems, CHI EA 2005, New York, NY, USA, pp. 1973–1976. Association for Computing Machinery (2005). https://doi.org/10.1145/1056808.1057070

  17. Pease, B., Pease, A.: The Definitive Book of Body Language: The Hidden Meaning Behind People’s Gestures and Expressions. Bantam (2008)

    Google Scholar 

  18. Piana, S., Staglianò, A., Odone, F., Camurri, A.: Adaptive body gesture representation for automatic emotion recognition. ACM Trans. Interact. Intell. Syst. 6(1), 1–31 (2016). https://doi.org/10.1145/2818740

    Article  Google Scholar 

  19. Psaltis, A., Kaza, K., Stefanidis, K., Thermos, S., Apostolakis, K.C.: Multimodal affective state recognition in serious games applications. In: IEEE International Conference on Imaging Systems and Techniques, pp. 435–439, October 2016. https://doi.org/10.1109/IST.2016.7738265

  20. Saha, S., Datta, S., Konar, A., Janarthanan, R.: A study on emotion recognition from body gestures using kinect sensor. In: 2014 International Conference on Communication and Signal Processing, pp. 056–060, April 2014. https://doi.org/10.1109/ICCSP.2014.6949798

  21. Sapiński, T., Kamińska, D., Pelikant, A., Ozcinar, C., Avots, E., Anbarjafari, G.: Multimodal database of emotional speech, video and gestures. In: International Conference on Pattern Recognition, pp. 153–163, August 2018

    Google Scholar 

  22. Siegman, A.W., Feldstein, S.: Nonverbal Behavior and Communication. Psychology Press (2014)

    Google Scholar 

  23. Sun, B., Cao, S., He, J., Yu, L.: Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw. 105, 36–51 (2017). https://doi.org/10.1016/j.neunet.2017.11.021

    Article  Google Scholar 

  24. Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3D action and gesture recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 142–157. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_9

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinyu Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, J., Yang, X., Dong, Y. (2021). Time-Dependent Body Gesture Representation for Video Emotion Recognition. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67832-6_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67831-9

  • Online ISBN: 978-3-030-67832-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics