Lightweight Multispectral Skeleton and Multi-stream Graph Attention Networks for Enhanced Action Prediction with Multiple Modalities

Huang, Teng; Kong, Weiqing; Liang, Jiaming; Ding, Ziyu; Li, Hui; Zhang, Xi

doi:10.1007/978-981-99-8429-9_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1010 Accesses

Abstract

Human action recognition methods often focus on extracting structural and temporal information from skeleton-based graphs. However, these approaches struggle with effectively capturing and processing extensive information during action transitions. To overcome this limitation, we propose LMS-GAT, a novel approach that facilitates information exchange through node concentration and diffusion across structural and temporal dimensions. By selectively suppressing and reinstating the representations of structural nodes for each specific action, and utilizing hierarchical shifted temporal windows for assessing temporal information, LMS-GAT addresses the challenge of dynamic changes in action recognition. Experimental evaluation on NTU RGB+D 60 and 120 datasets shows that LMS-GAT outperforms state-of-the-art methods in terms of prediction accuracy. This highlights the efficacy of our approach in capturing and recognizing human actions with improved performance.

Supported by the National Natural Science Foundation of China under Grant 62002074 and 62072452; Supported by the Shenzhen Science and Technology Program JCYJ20200109115627045, in part by the Regional Joint Fund of Guangdong under Grant 2021B1515120011.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, T., Huang, J., Pang, Y., Yan, H.: Smart contract watermarking based on code obfuscation. Inf. Sci. 628, 439–448 (2023)
Article Google Scholar
Li, J., et al.: Efficient and secure outsourcing of differentially private data publishing with multiple evaluators. IEEE Trans. Dependable Secure Comput. 19(01), 67–76 (2022)
Article Google Scholar
Dong, C.-Z., Catbas, F.N.: A review of computer vision-based structural health monitoring at local and global levels. Struct. Health Monit. 20(2), 692–743 (2021)
Article Google Scholar
Senior, A., et al.: Enabling video privacy through computer vision. IEEE Secur. Priv. 3(3), 50–57 (2005)
Article Google Scholar
Kosch, T., Welsch, R., Chuang, L., Schmidt, A.: The placebo effect of artificial intelligence in human-computer interaction. ACM Trans. Comput.-Hum. Interact. 29(6), 1–32 (2023)
Article Google Scholar
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Google Scholar
Hao, X., Li, J., Guo, Y., Jiang, T., Yu, M.: Hypergraph neural network for skeleton-based action recognition. IEEE Trans. Image Process. 30, 2263–2275 (2021)
Article MathSciNet Google Scholar
Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 208, 103219 (2021)
Article Google Scholar
Yang, C., Xu, Y., Shi, J., Dai, B., Zhou, B.: Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 591–600 (2020)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Google Scholar
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)
Google Scholar
Chi, H.-G., Ha, M. H., Chi, S., Lee, S.W., Huang, Q., Ramani, K.: InfoGCN: representation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20186–20196 (2022)
Google Scholar
Pang, Y., et al.: Graph decipher: a transparent dual-attention graph neural network to understand the message-passing mechanism for the node classification. Int. J. Intell. Syst. 37(11), 8747–8769 (2022)
Article Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Google Scholar
Plizzari, C., Cannici, M., Matteucci, M.: Spatial temporal transformer network for skeleton-based action recognition. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12663, pp. 694–701. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68796-0_50
Chapter Google Scholar
Goyal, P., Chhetri, S.R., Canedo, A.: dyngraph2vec: capturing network dynamics using dynamic graph representation learning. Knowl.-Based Syst. 187, 104816 (2020)
Article Google Scholar
Hajiramezanali, E., Hasanzadeh, A., Narayanan, K., Duffield, N., Zhou, M., Qian, X.: Variational graph recurrent neural networks, arXiv preprint arXiv:1908.09710 (2019)
Xu, D., Ruan, C., Korpeoglu, E., Kumar, S., Achan, K.: Inductive representation learning on temporal graphs, arXiv preprint arXiv:2002.07962 (2020)
Sankar, A., Wu, Y., Gou, L., Zhang, W., Yang, H.: DySAT: deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 519–527 (2020)
Google Scholar
Pang, Y., et al.: Sparse-DYN: sparse dynamic graph multirepresentation learning via event-based sparse temporal attention network. Int. J. Intell. Syst. 37(11), 8770–8789 (2022)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Google Scholar
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.-Y., Kot, A.C.: NTU RGB+ D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2019)
Article Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, China
Teng Huang, Weiqing Kong, Jiaming Liang, Ziyu Ding & Hui Li
School of Arts, Sun Yat-sen University, Guangzhou, China
Xi Zhang
College of Music, University of Colorado Boulder, Boulder, USA
Xi Zhang

Authors

Teng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weiqing Kong
View author publications
You can also search for this author in PubMed Google Scholar
Jiaming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Ding
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Xi Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Teng Huang or Xi Zhang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, T., Kong, W., Liang, J., Ding, Z., Li, H., Zhang, X. (2024). Lightweight Multispectral Skeleton and Multi-stream Graph Attention Networks for Enhanced Action Prediction with Multiple Modalities. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_6

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_6
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lightweight Multispectral Skeleton and Multi-stream Graph Attention Networks for Enhanced Action Prediction with Multiple Modalities