Abstract
Human action recognition methods often focus on extracting structural and temporal information from skeleton-based graphs. However, these approaches struggle with effectively capturing and processing extensive information during action transitions. To overcome this limitation, we propose LMS-GAT, a novel approach that facilitates information exchange through node concentration and diffusion across structural and temporal dimensions. By selectively suppressing and reinstating the representations of structural nodes for each specific action, and utilizing hierarchical shifted temporal windows for assessing temporal information, LMS-GAT addresses the challenge of dynamic changes in action recognition. Experimental evaluation on NTU RGB+D 60 and 120 datasets shows that LMS-GAT outperforms state-of-the-art methods in terms of prediction accuracy. This highlights the efficacy of our approach in capturing and recognizing human actions with improved performance.
Supported by the National Natural Science Foundation of China under Grant 62002074 and 62072452; Supported by the Shenzhen Science and Technology Program JCYJ20200109115627045, in part by the Regional Joint Fund of Guangdong under Grant 2021B1515120011.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, T., Huang, J., Pang, Y., Yan, H.: Smart contract watermarking based on code obfuscation. Inf. Sci. 628, 439–448 (2023)
Li, J., et al.: Efficient and secure outsourcing of differentially private data publishing with multiple evaluators. IEEE Trans. Dependable Secure Comput. 19(01), 67–76 (2022)
Dong, C.-Z., Catbas, F.N.: A review of computer vision-based structural health monitoring at local and global levels. Struct. Health Monit. 20(2), 692–743 (2021)
Senior, A., et al.: Enabling video privacy through computer vision. IEEE Secur. Priv. 3(3), 50–57 (2005)
Kosch, T., Welsch, R., Chuang, L., Schmidt, A.: The placebo effect of artificial intelligence in human-computer interaction. ACM Trans. Comput.-Hum. Interact. 29(6), 1–32 (2023)
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton-based action recognition. IEEE Trans. Image Process. 30, 2263–2275 (2021)
Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 208, 103219 (2021)
Yang, C., Xu, Y., Shi, J., Dai, B., Zhou, B.: Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 591–600 (2020)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)
Chi, H.-G., Ha, M. H., Chi, S., Lee, S.W., Huang, Q., Ramani, K.: InfoGCN: representation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20186–20196 (2022)
Pang, Y., et al.: Graph decipher: a transparent dual-attention graph neural network to understand the message-passing mechanism for the node classification. Int. J. Intell. Syst. 37(11), 8747–8769 (2022)
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Plizzari, C., Cannici, M., Matteucci, M.: Spatial temporal transformer network for skeleton-based action recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12663, pp. 694–701. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68796-0_50
Goyal, P., Chhetri, S.R., Canedo, A.: dyngraph2vec: capturing network dynamics using dynamic graph representation learning. Knowl.-Based Syst. 187, 104816 (2020)
Hajiramezanali, E., Hasanzadeh, A., Narayanan, K., Duffield, N., Zhou, M., Qian, X.: Variational graph recurrent neural networks, arXiv preprint arXiv:1908.09710 (2019)
Xu, D., Ruan, C., Korpeoglu, E., Kumar, S., Achan, K.: Inductive representation learning on temporal graphs, arXiv preprint arXiv:2002.07962 (2020)
Sankar, A., Wu, Y., Gou, L., Zhang, W., Yang, H.: DySAT: deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 519–527 (2020)
Pang, Y., et al.: Sparse-DYN: sparse dynamic graph multirepresentation learning via event-based sparse temporal attention network. Int. J. Intell. Syst. 37(11), 8770–8789 (2022)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.-Y., Kot, A.C.: NTU RGB+ D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2019)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Huang, T., Kong, W., Liang, J., Ding, Z., Li, H., Zhang, X. (2024). Lightweight Multispectral Skeleton and Multi-stream Graph Attention Networks for Enhanced Action Prediction with Multiple Modalities. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_6
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)