Interactive semantics neural networks for skeleton-based human interaction recognition

Huang, Junkai; Zheng, Rui; Cheng, Youyong; Hu, Jiaqian; Hu, Weijun; Shang, Wenli; Zhang, Man; Cao, Zhong

doi:10.1007/s00371-024-03420-4

Interactive semantics neural networks for skeleton-based human interaction recognition

Research
Published: 07 May 2024

Volume 40, pages 7147–7160, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Junkai Huang¹,
Rui Zheng¹,
Youyong Cheng¹,
Jiaqian Hu¹,
Weijun Hu¹,
Wenli Shang¹,
Man Zhang¹ &
…
Zhong Cao¹

359 Accesses
1 Citation
Explore all metrics

Abstract

Skeleton-based human interaction recognition is a formidable challenge that demands the capability to discern spatial, temporal, and interactive features. However, current research still faces some limitations in identifying spatial, temporal, and interaction features. Methods based on graph convolutional networks often prove to be insufficient in capturing interactive features and structural semantic information of skeletons. In order to solve this problem, we construct a Mutual-semantic Adjacency Matrix (MAM) by amalgamating the relative semantic attention of two skeleton sequences. This MAM was then integrated with the convolution of residual graphs to enhance the extraction of spatial and interaction features. We propose a novel interactive semantics neural network (ISNN) for skeleton-based human interaction recognition to hierarchically fuse MAM and structural semantic information. In addition, integrating the bone stream, we propose a two-stream Interactive Semantics Neural Network (2 s-ISNN). Experiments conducted with our models on two interaction datasets, NTU-RGB+D (mutual) and NTU-RGB+D 120 (mutual), demonstrate significantly improved recognition capabilities in comprehending human interactions. The source code is available at: https://github.com/czant1977/ISNN-master//.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition

A Two-Stream Hybrid CNN-Transformer Network for Skeleton-Based Human Interaction Recognition

MGSAN: multimodal graph self-attention network for skeleton-based action recognition

Article Open access 27 November 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multim. 19(2), 4–10 (2012)
Article Google Scholar
Kamel, A., Sheng, B., Li, P., et al.: Hybrid refinement-correction heatmaps for human pose estimation. IEEE Trans. Multim. 23, 1330–1342 (2021)
Article Google Scholar
Wu, Y., Wang, C.: Parallel-branch network for 3d human pose and shape estimation in video. Comput. Animat. Virtual Worlds 33(3–4), e2078 (2022)
Article Google Scholar
Manzi, A., Fiorini, L., Limosani, R., et al.: Two-person activity recognition using skeleton data. IET Comput. Vis. 12(1), 27–35 (2018)
Article Google Scholar
Perez, M., Liu, J., Kot, AC.: Interaction recognition through body parts relation reasoning. In: Proc. Asian Conf. Comput. Vis. Pattern Recognit., pp 268–280 (2019)
Perez, M., Liu, J., Kot, A.C.: Interaction relational network for mutual action recognition. IEEE Trans. Multim. 24, 366–376 (2022)
Article Google Scholar
Zhu, A., Wu, Q., Cui, R., et al.: Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN. Neurocomputing 414, 90–100 (2020)
Article Google Scholar
Liu, J., Wang, G., Duan, L., et al.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Li, J., Xie, X., Cao, Y., et al.: Knowledge embedded GCN for skeleton-based two-person interaction recognition. Neurocomputing 444, 338–348 (2021)
Article Google Scholar
Zhu, L., Wan, B., Li, C., et al.: Dyadic relational graph convolutional networks for skeleton-based human interaction recognition. Pattern Recognit. 115, 107920 (2021)
Article Google Scholar
Gao, F., Xia, H., Tang, Z.: Attention interactive graph convolutional network for skeleton-based human interaction recognition. In: Proc. IEEE Int. Conf. Multimedia Expo, pp 1–6 (2022)
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)
Article Google Scholar
Ke, Q., Bennamoun, M., An, S., et al.: A new representation of skeleton sequences for 3d action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 4570–4579 (2017)
Liu, H., Tu, J., Liu, M.: Two-stream 3d convolutional neural network for skeleton-based action recognition. (2017) arXiv:1705.08106
Li, C., Zhong, Q., Xie, D., et al.: Skeleton-based action recognition with convolutional neural networks. In: Proc. IEEE Int. Conf. Multimedia Expo Workshops, pp 597–600 (2017)
Cao, C., Lan, C., Zhang, Y., et al.: Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 29(11), 3247–3257 (2019)
Article Google Scholar
Song, S., Lan, C., Xing, J., et al.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proc. AAAI Conf. Artif. Intell., pp 4263–4270 (2017)
Zhang, P., Lan, C., Xing, J., et al.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proc. IEEE Int. Conf. Comput. Vis (2017)
Si, C., Jing, Y., Wang, W., et al.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proc. Eur. Conf. Comput. Vis., pp 106–121 (2018)
Li, S., Li, W., Cook, C., et al.: Independently recurrent neural network (indrnn): Building a longer and deeper RNN. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 5457–5466 (2018)
Li, L., Zheng, W., Zhang, Z., et al.: Skeleton-based relational modeling for action recognition. (2018) arXiv:1805.02556
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proc. AAAI Conf. Artif. Intell., pp 7444–7452 (2018)
Li, C., Cui, Z., Zheng, W., et al.: Spatio-temporal graph convolution for skeleton based action recognition. In: Proc. AAAI Conf. Artif. Intell., pp 3482–3489 (2018)
Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 12026–12035 (2019)
Gao, X., Hu, W., Tang, J., et al.: Optimized skeleton-based action recognition via sparsified graph regression. In: Proc. ACM 27th Int. Conf. Multimedia, pp 601–610 (2019)
Li, M., Chen, S., Chen, X., et al.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 3595–3603 (2019)
Liu, J., Shahroudy, A., Xu, D., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)
Article Google Scholar
Liu, J., Wang, G., Hu, P., et al.: Global context-aware attention LSTM networks for 3d action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 3671–3680 (2017)
Zhang, P., Lan, C., Zeng, W., et al.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 1109–1118 (2020)
Liu, Z., Zhang, H., Chen, Z., et al.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Computer Vision Foundation / IEEE, pp 140–149 (2020)
Chen, Z., Li, S., Yang, B., et al.: Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In: AAAI. AAAI Press, pp 1113–1122 (2021)
Lee, J., Lee, M., Lee, D., et al.: Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. CoRR abs/2208.10741 (2022)
Weng, J., Liu, M., Jiang, X., et al.: Deformable pose traversal convolution for 3d action and gesture recognition. In: Proc. 15th Eur. Conf. Comput. Vis., pp 142–157 (2018)
Zhang, P., Lan, C., Xing, J., et al.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T., et al.: NTU RGB+D: A large scale dataset for 3d human activity analysis. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 1010–1019 (2016)
Liu, J., Shahroudy, A., Perez, M., et al.: NTU RGB+D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2020)
Article Google Scholar
Kipf, TN., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proc. 5th Int. Conf. Learn. Represent (2017)
Zhang, Z., Chen, D., Wang, J., et al.: Quantum-based subgraph convolutional neural networks. Pattern Recognit. 88, 38–49 (2019)
Article Google Scholar
Wu, J., Zhong, S., Liu, Y.: Dynamic graph convolutional network for multi-video summarization. Pattern Recognit. 107, 107382 (2020)
Article Google Scholar
Bin, Y., Chen, Z., Wei, X., et al.: Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit. 106, 107410 (2020)
Article Google Scholar
Manessi, F., Rozza, A., Manzo, M.: Dynamic graph convolutional networks. Pattern Recognit 97 (2020)
Wang, H., Wang, L.: Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit. 81, 23–35 (2018)
Article Google Scholar
Shuman, D.I., Narang, S.K., Frossard, P., et al.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)
Article Google Scholar
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. (2015) arXiv:1506.05163
Bruna, J., Zaremba, W., Szlam, A., et al.: Spectral networks and locally connected networks on graphs. In: Proc. 2nd Int. Conf. Learn. Represent (2014)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 3837–3845 (2016)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: Proc. 33nd Int. Conf. Mach. Learn., pp 2014–2023 (2016)
Hamilton, WL., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 1024–1034 (2017)
Monti, F., Boscaini, D., Masci, J., et al.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 5425–5434 (2017)
Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 2224–2232 (2015)
Kipf, TN., Fetaya, E., Wang, K., et al.: Neural relational inference for interacting systems. In: Proc. 35th Int. Conf. Mach. Learn., pp 2693–2702 (2018)
Yun, K., Honorio, J., Chattopadhyay, D., et al.: Two-person interaction detection using body-pose features and multiple instance learning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp 28–35 (2012)
Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: Proc. IEEE Int. Conf. Multimedia Expo Workshops, pp 1–6 (2014)
Wu, H., Shao, J., Xu, X., et al.: Recognition and detection of two-person interactive actions using automatically selected skeleton features. IEEE Trans. Hum. Mach. Syst. 48(3), 304–310 (2018)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 5998–6008 (2017)
Zheng, H., Fu, J., Zha, Z., et al.: Learning deep bilinear transformation for fine-grained image representation. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 4279–4288 (2019)
Zhang, P., Lan, C., Zeng, W., et al.: Multi-scale semantics-guided neural networks for efficient skeleton-based human action recognition. (2021) arXiv:2111.03993
Chen, H., Jing, L.: Light-weight enhanced semantics-guided neural networks for skeleton-based human action recognition. In: MCSoC, pp 190–196 (2021)
Xu, Q., Liu, F., Fu, Z., et al.: Aes-gcn: Attention-enhanced semantic-guided graph convolutional networks for skeleton-based action recognition. Comput. Animat Virtual Worlds 33(3-4) (2022)
Wang, X., Gupta, A.: Videos as space-time region graphs. In: Proc. 15th Eur. Conf. Comput. Vis., pp 413–431 (2018)
Wang, X., Girshick, RB., Gupta, A., et al.: Non-local neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 7794–7803 (2018)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proc. 32nd Int. Conf. Mach. Learn., pp 448–456 (2015)
He, T., Zhang, Z., Zhang, H., et al.: Bag of tricks for image classification with convolutional neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 558–567 (2019)

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China under Grant 2021YFB2012400, the National Natural Science Foundation of China under Grant 62173101, the Basic and Applied Basic Research Funding of Guangdong Province under Grant 2022A1515011558 and Grant 2022A1515010865, the Guangzhou Science and Technology Funding under Grant 202201020217, the Key Laboratory of Guangdong Higher Education Institutes under Grant 2023KSYS002.

Author information

Authors and Affiliations

Guangzhou University, School of Electronics and Communication Engineering, Guangzhou, 510006, Guangdong, China
Junkai Huang, Rui Zheng, Youyong Cheng, Jiaqian Hu, Weijun Hu, Wenli Shang, Man Zhang & Zhong Cao

Authors

Junkai Huang
View author publications
You can also search for this author inPubMed Google Scholar
Rui Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Youyong Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Jiaqian Hu
View author publications
You can also search for this author inPubMed Google Scholar
Weijun Hu
View author publications
You can also search for this author inPubMed Google Scholar
Wenli Shang
View author publications
You can also search for this author inPubMed Google Scholar
Man Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhong Cao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhong Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, J., Zheng, R., Cheng, Y. et al. Interactive semantics neural networks for skeleton-based human interaction recognition. Vis Comput 40, 7147–7160 (2024). https://doi.org/10.1007/s00371-024-03420-4

Download citation

Accepted: 15 April 2024
Published: 07 May 2024
Issue Date: October 2024
DOI: https://doi.org/10.1007/s00371-024-03420-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive semantics neural networks for skeleton-based human interaction recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition

A Two-Stream Hybrid CNN-Transformer Network for Skeleton-Based Human Interaction Recognition

MGSAN: multimodal graph self-attention network for skeleton-based action recognition

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now