Skip to main content

Advertisement

Log in

Interactive semantics neural networks for skeleton-based human interaction recognition

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Skeleton-based human interaction recognition is a formidable challenge that demands the capability to discern spatial, temporal, and interactive features. However, current research still faces some limitations in identifying spatial, temporal, and interaction features. Methods based on graph convolutional networks often prove to be insufficient in capturing interactive features and structural semantic information of skeletons. In order to solve this problem, we construct a Mutual-semantic Adjacency Matrix (MAM) by amalgamating the relative semantic attention of two skeleton sequences. This MAM was then integrated with the convolution of residual graphs to enhance the extraction of spatial and interaction features. We propose a novel interactive semantics neural network (ISNN) for skeleton-based human interaction recognition to hierarchically fuse MAM and structural semantic information. In addition, integrating the bone stream, we propose a two-stream Interactive Semantics Neural Network (2 s-ISNN). Experiments conducted with our models on two interaction datasets, NTU-RGB+D (mutual) and NTU-RGB+D 120 (mutual), demonstrate significantly improved recognition capabilities in comprehending human interactions. The source code is available at: https://github.com/czant1977/ISNN-master//.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multim. 19(2), 4–10 (2012)

    Article  Google Scholar 

  2. Kamel, A., Sheng, B., Li, P., et al.: Hybrid refinement-correction heatmaps for human pose estimation. IEEE Trans. Multim. 23, 1330–1342 (2021)

    Article  Google Scholar 

  3. Wu, Y., Wang, C.: Parallel-branch network for 3d human pose and shape estimation in video. Comput. Animat. Virtual Worlds 33(3–4), e2078 (2022)

    Article  Google Scholar 

  4. Manzi, A., Fiorini, L., Limosani, R., et al.: Two-person activity recognition using skeleton data. IET Comput. Vis. 12(1), 27–35 (2018)

    Article  Google Scholar 

  5. Perez, M., Liu, J., Kot, AC.: Interaction recognition through body parts relation reasoning. In: Proc. Asian Conf. Comput. Vis. Pattern Recognit., pp 268–280 (2019)

  6. Perez, M., Liu, J., Kot, A.C.: Interaction relational network for mutual action recognition. IEEE Trans. Multim. 24, 366–376 (2022)

    Article  Google Scholar 

  7. Zhu, A., Wu, Q., Cui, R., et al.: Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN. Neurocomputing 414, 90–100 (2020)

    Article  Google Scholar 

  8. Liu, J., Wang, G., Duan, L., et al.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)

    Article  MathSciNet  Google Scholar 

  9. Li, J., Xie, X., Cao, Y., et al.: Knowledge embedded GCN for skeleton-based two-person interaction recognition. Neurocomputing 444, 338–348 (2021)

    Article  Google Scholar 

  10. Zhu, L., Wan, B., Li, C., et al.: Dyadic relational graph convolutional networks for skeleton-based human interaction recognition. Pattern Recognit. 115, 107920 (2021)

    Article  Google Scholar 

  11. Gao, F., Xia, H., Tang, Z.: Attention interactive graph convolutional network for skeleton-based human interaction recognition. In: Proc. IEEE Int. Conf. Multimedia Expo, pp 1–6 (2022)

  12. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)

    Article  Google Scholar 

  13. Ke, Q., Bennamoun, M., An, S., et al.: A new representation of skeleton sequences for 3d action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 4570–4579 (2017)

  14. Liu, H., Tu, J., Liu, M.: Two-stream 3d convolutional neural network for skeleton-based action recognition. (2017) arXiv:1705.08106

  15. Li, C., Zhong, Q., Xie, D., et al.: Skeleton-based action recognition with convolutional neural networks. In: Proc. IEEE Int. Conf. Multimedia Expo Workshops, pp 597–600 (2017)

  16. Cao, C., Lan, C., Zhang, Y., et al.: Skeleton-based action recognition with gated convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 29(11), 3247–3257 (2019)

    Article  Google Scholar 

  17. Song, S., Lan, C., Xing, J., et al.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proc. AAAI Conf. Artif. Intell., pp 4263–4270 (2017)

  18. Zhang, P., Lan, C., Xing, J., et al.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proc. IEEE Int. Conf. Comput. Vis (2017)

  19. Si, C., Jing, Y., Wang, W., et al.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proc. Eur. Conf. Comput. Vis., pp 106–121 (2018)

  20. Li, S., Li, W., Cook, C., et al.: Independently recurrent neural network (indrnn): Building a longer and deeper RNN. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 5457–5466 (2018)

  21. Li, L., Zheng, W., Zhang, Z., et al.: Skeleton-based relational modeling for action recognition. (2018) arXiv:1805.02556

  22. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proc. AAAI Conf. Artif. Intell., pp 7444–7452 (2018)

  23. Li, C., Cui, Z., Zheng, W., et al.: Spatio-temporal graph convolution for skeleton based action recognition. In: Proc. AAAI Conf. Artif. Intell., pp 3482–3489 (2018)

  24. Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 12026–12035 (2019)

  25. Gao, X., Hu, W., Tang, J., et al.: Optimized skeleton-based action recognition via sparsified graph regression. In: Proc. ACM 27th Int. Conf. Multimedia, pp 601–610 (2019)

  26. Li, M., Chen, S., Chen, X., et al.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 3595–3603 (2019)

  27. Liu, J., Shahroudy, A., Xu, D., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2018)

    Article  Google Scholar 

  28. Liu, J., Wang, G., Hu, P., et al.: Global context-aware attention LSTM networks for 3d action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 3671–3680 (2017)

  29. Zhang, P., Lan, C., Zeng, W., et al.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 1109–1118 (2020)

  30. Liu, Z., Zhang, H., Chen, Z., et al.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Computer Vision Foundation / IEEE, pp 140–149 (2020)

  31. Chen, Z., Li, S., Yang, B., et al.: Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. In: AAAI. AAAI Press, pp 1113–1122 (2021)

  32. Lee, J., Lee, M., Lee, D., et al.: Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. CoRR abs/2208.10741 (2022)

  33. Weng, J., Liu, M., Jiang, X., et al.: Deformable pose traversal convolution for 3d action and gesture recognition. In: Proc. 15th Eur. Conf. Comput. Vis., pp 142–157 (2018)

  34. Zhang, P., Lan, C., Xing, J., et al.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)

    Article  Google Scholar 

  35. Shahroudy, A., Liu, J., Ng, T., et al.: NTU RGB+D: A large scale dataset for 3d human activity analysis. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 1010–1019 (2016)

  36. Liu, J., Shahroudy, A., Perez, M., et al.: NTU RGB+D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2020)

    Article  Google Scholar 

  37. Kipf, TN., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proc. 5th Int. Conf. Learn. Represent (2017)

  38. Zhang, Z., Chen, D., Wang, J., et al.: Quantum-based subgraph convolutional neural networks. Pattern Recognit. 88, 38–49 (2019)

    Article  Google Scholar 

  39. Wu, J., Zhong, S., Liu, Y.: Dynamic graph convolutional network for multi-video summarization. Pattern Recognit. 107, 107382 (2020)

    Article  Google Scholar 

  40. Bin, Y., Chen, Z., Wei, X., et al.: Structure-aware human pose estimation with graph convolutional networks. Pattern Recognit. 106, 107410 (2020)

    Article  Google Scholar 

  41. Manessi, F., Rozza, A., Manzo, M.: Dynamic graph convolutional networks. Pattern Recognit 97 (2020)

  42. Wang, H., Wang, L.: Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit. 81, 23–35 (2018)

    Article  Google Scholar 

  43. Shuman, D.I., Narang, S.K., Frossard, P., et al.: The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30(3), 83–98 (2013)

    Article  Google Scholar 

  44. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. (2015) arXiv:1506.05163

  45. Bruna, J., Zaremba, W., Szlam, A., et al.: Spectral networks and locally connected networks on graphs. In: Proc. 2nd Int. Conf. Learn. Represent (2014)

  46. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 3837–3845 (2016)

  47. Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: Proc. 33nd Int. Conf. Mach. Learn., pp 2014–2023 (2016)

  48. Hamilton, WL., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 1024–1034 (2017)

  49. Monti, F., Boscaini, D., Masci, J., et al.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 5425–5434 (2017)

  50. Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 2224–2232 (2015)

  51. Kipf, TN., Fetaya, E., Wang, K., et al.: Neural relational inference for interacting systems. In: Proc. 35th Int. Conf. Mach. Learn., pp 2693–2702 (2018)

  52. Yun, K., Honorio, J., Chattopadhyay, D., et al.: Two-person interaction detection using body-pose features and multiple instance learning. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, pp 28–35 (2012)

  53. Ji, Y., Ye, G., Cheng, H.: Interactive body part contrast mining for human interaction recognition. In: Proc. IEEE Int. Conf. Multimedia Expo Workshops, pp 1–6 (2014)

  54. Wu, H., Shao, J., Xu, X., et al.: Recognition and detection of two-person interactive actions using automatically selected skeleton features. IEEE Trans. Hum. Mach. Syst. 48(3), 304–310 (2018)

    Article  Google Scholar 

  55. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 5998–6008 (2017)

  56. Zheng, H., Fu, J., Zha, Z., et al.: Learning deep bilinear transformation for fine-grained image representation. In: Proc. Int. Conf. Neural Inf. Process. Syst., pp 4279–4288 (2019)

  57. Zhang, P., Lan, C., Zeng, W., et al.: Multi-scale semantics-guided neural networks for efficient skeleton-based human action recognition. (2021) arXiv:2111.03993

  58. Chen, H., Jing, L.: Light-weight enhanced semantics-guided neural networks for skeleton-based human action recognition. In: MCSoC, pp 190–196 (2021)

  59. Xu, Q., Liu, F., Fu, Z., et al.: Aes-gcn: Attention-enhanced semantic-guided graph convolutional networks for skeleton-based action recognition. Comput. Animat Virtual Worlds 33(3-4) (2022)

  60. Wang, X., Gupta, A.: Videos as space-time region graphs. In: Proc. 15th Eur. Conf. Comput. Vis., pp 413–431 (2018)

  61. Wang, X., Girshick, RB., Gupta, A., et al.: Non-local neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 7794–7803 (2018)

  62. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proc. 32nd Int. Conf. Mach. Learn., pp 448–456 (2015)

  63. He, T., Zhang, Z., Zhang, H., et al.: Bag of tricks for image classification with convolutional neural networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp 558–567 (2019)

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China under Grant 2021YFB2012400, the National Natural Science Foundation of China under Grant 62173101, the Basic and Applied Basic Research Funding of Guangdong Province under Grant 2022A1515011558 and Grant 2022A1515010865, the Guangzhou Science and Technology Funding under Grant 202201020217, the Key Laboratory of Guangdong Higher Education Institutes under Grant 2023KSYS002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Cao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, J., Zheng, R., Cheng, Y. et al. Interactive semantics neural networks for skeleton-based human interaction recognition. Vis Comput 40, 7147–7160 (2024). https://doi.org/10.1007/s00371-024-03420-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03420-4

Keywords