Abstract
Human gait is a manner of individual’s walking that observers are able to learn useful information through daily walking activities. Recently, gait skeletons-based emotion recognition has attracted much attention, while many methods have been proposed gradually. Skeleton-based representations offer several advantages for recognition tasks. In particular, such representation is extremely lightweight and could be directly extracted from video data using off-the-shelf algorithms. Moreover, skeleton data is not tied to any specific cultural or ethnic context, it hence become increasingly popular in recent years for cross-cultural studies and other related applications. To effectively process this type of data, many researchers have turned to Graph Convolutional Networks (GCNs) to leverage the topological structure of the data, which improves performance by modeling the relationships between different joints and body parts as a graph. This allows GCNs to capture complex spatial and temporal patterns. In this work, we have constructed an efficient multi-stream GCN framework for emotion recognition task. We have identified the complementary effect among streams using a multi-thread attention method (MTA), which is able to improve the emotion recognition performance. In addition, the proposed MTA graph convolution layer is able to extract effective features from the topology of the graph to further improve recognition performance. The proposed method outperforms state-of-art methods on challenging benchmark dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barrett, L.F.: How Emotions are Made: The Secret Life of the Brain. Pan Macmillan (2017)
Bhattacharya, U., Mittal, T., Chandra, R., Randhavane, T., Bera, A., Manocha, D.: STEP: spatial temporal graph convolutional networks for emotion perception from gaits. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1342–1350 (2020)
Bhattacharya, U., et al.: Take an emotion walk: perceiving emotions from gaits using hierarchical attention pooling and affective mapping. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 145–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_9
Chai, S., et al.: A multi-head pseudo nodes based spatial-temporal graph convolutional network for emotion perception from gait. Neurocomputing 511, 437–447 (2022)
Chen, T., et al.: Learning multi-granular spatio-temporal graph network for skeleton-based action recognition. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4334–4342 (2021)
Chen, Z., Li, S., Yang, B., Li, Q., Liu, H.: Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1113–1122 (2021)
Crenn, A., Khan, R.A., Meyer, A., Bouakaz, S.: Body expression recognition from animated 3D skeleton. In: 2016 International Conference on 3D Imaging (IC3D), pp. 1–7. IEEE (2016)
Daoudi, M., Berretti, S., Pala, P., Delevoye, Y., Del Bimbo, A.: Emotion recognition by body movement representation on the manifold of symmetric positive definite matrices. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10484, pp. 550–560. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68560-1_49
Hou, R., Li, Y., Zhang, N., Zhou, Y., Yang, X., Wang, Z.: Shifting perspective to see difference: a novel multi-view method for skeleton based action recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4987–4995 (2022)
Hou, R., Wang, Z., Ren, R., Cao, Y., Wang, Z.: Multi-channel network: constructing efficient GCN baselines for skeleton-based action recognition. Compu. Graph. 110, 111–117 (2023)
Hu, C., Sheng, W., Dong, B., Li, X.: TNTC: two-stream network with transformer-based complementarity for gait-based emotion recognition. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3229–3233. IEEE (2022)
Li, B., Zhu, C., Li, S., Zhu, T.: Identifying emotions from non-contact gaits information based on microsoft kinects. IEEE Trans. Affect. Comput. 9(4), 585–591 (2016)
Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568 (2019)
Li, S., Cui, L., Zhu, C., Li, B., Zhao, N., Zhu, T.: Emotion recognition using kinect motion capture data of human gaits. PeerJ 4, e2364 (2016)
Liu, W., Zheng, W.-L., Lu, B.-L.: Emotion recognition using multimodal deep learning. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 521–529. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_58
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Lu, H., Xu, S., Zhao, S., Hu, X., Ma, R., Hu, B.: EPIC: emotion perception by spatio-temporal interaction context of gait. IEEE J. Biomed. Health Inf. (2023)
Ma, R., Hu, H., Xing, S., Li, Z.: Efficient and fast real-world noisy image denoising by combining pyramid neural network and two-pathway unscented Kalman filter. IEEE Trans. Image Process. 29, 3927–3940 (2020)
Ma, R., Li, S., Zhang, B., Fang, L., Li, Z.: Flexible and generalized real photograph denoising exploiting dual meta attention. IEEE Trans. Cybern. (2022)
Ma, R., Li, S., Zhang, B., Hu, H.: Meta PID attention network for flexible and efficient real-world noisy image denoising. IEEE Trans. Image Process. 31, 2053–2066 (2022)
Ma, R., Li, S., Zhang, B., Li, Z.: Towards fast and robust real image denoising with attentive neural network and PID controller. IEEE Trans. Multimedia 24, 2366–2377 (2021)
Ma, R., Li, S., Zhang, B., Li, Z.: Generative adaptive convolutions for real-world noisy image denoising. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1935–1943 (2022)
Ma, R., Zhang, B., Zhou, Y., Li, Z., Lei, F.: PID controller-guided attention neural network learning for fast and effective real photographs denoising. IEEE Trans. Neural Netw. Learn. Syst. 33(7), 3010–3023 (2021)
Muhammad, G., Hossain, M.S.: Emotion recognition for cognitive edge computing using deep learning. IEEE Internet Things J. 8(23), 16894–16901 (2021)
Narayanan, V., Manoghar, B.M., Dorbala, V.S., Manocha, D., Bera, A.: ProxEmo: gait-based emotion learning and multi-view proxemic fusion for socially-aware robot navigation. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 8200–8207. IEEE (2020)
Qin, Z., et al.: Fusing higher-order features in graph neural networks for skeleton-based action recognition (2021)
Randhavane, T., Bhattacharya, U., Kapsaskis, K., Gray, K., Bera, A., Manocha, D.: Identifying emotions from walking using affective and deep features. arXiv preprint arXiv:1906.11884 (2019)
Sheng, W., Li, X.: Multi-task learning for gait-based identity recognition and emotion recognition using attention enhanced temporal graph convolutional network. Pattern Recogn. 114, 107868 (2021)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. arXiv preprint arXiv:2106.15125 (2021)
Vu, M.T., Beurton-Aimar, M., Marchand, S.: Multitask multi-database emotion recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3637–3644 (2021)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Zhang, J., Yin, Z., Chen, P., Nichele, S.: Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Inf. Fusion 59, 103–126 (2020)
Zhang, X., Xu, C., Tao, D.: Context aware graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14333–14342 (2020)
Zhuang, Y., Lin, L., Tong, R., Liu, J., Iwamot, Y., Chen, Y.W.: G-GCSN: global graph convolution shrinkage network for emotion perception from gait. In: Proceedings of the Asian Conference on Computer Vision (2020)
Acknowledgements
This research has been supported by National Key Research and Development Project of China (Grant No. 2021ZD0110505), Natural Key Research and Development Project of Zhejiang Province (Grant No. 2023C01043) and Ningbo Natural Science Foundation (Grant 2022Z072, 2023Z236).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lu, J., Wang, Z., Zhang, Z., Du, Y., Zhou, Y., Wang, Z. (2024). Emotion Recognition via 3D Skeleton Based Gait Analysis Using Multi-thread Attention Graph Convolutional Networks. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14429. Springer, Singapore. https://doi.org/10.1007/978-981-99-8469-5_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-8469-5_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8468-8
Online ISBN: 978-981-99-8469-5
eBook Packages: Computer ScienceComputer Science (R0)