Skip to main content
Log in

Hybrid two-stream dynamic CNN for view adaptive human action recognition using ensemble learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Human actions are sequential, and structured patterns of the body parts and their movements. In this paper, we present a hybrid two-stream convolutional neural network (H2SCNN) for the recognition of actions from sequences by exploring the statistical information like skeletons. This aims to exploit the skeletons completely and identify the actions properly by merging the different motion related features. These features include motion and joint features. The framework calculates the distance between consecutive sequences to form the temporal information required for the recognition process. The proposed H2SCNN is based on two stages. The neighbourhood feature model will be used to process both inputs individually in the first step. In the second stage, it performs ensemble learning and takes advantage of the diversity of multiple features by fusing them together. The multi-task ensemble learning model helps the system to improve the prediction ability of H2SCNN. Experiments on the benchmark dataset have shown the superiority of the proposed model with other recent approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: Advances in neural information processing systems, pp 1993–2001

  2. Caetano C, Sena J, Brémond F, Dos Santos JA, Schwartz WR (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–8

  3. De Jong M, Joss S, Schraven D, Zhan C, Weijnen M (2015) Sustainable-smart-resilient-low carbon-eco-knowledge cities; making sense of a multitude of concepts promoting sustainable urbanization. J Clean Prod 109:25–38

    Article  Google Scholar 

  4. Ding W, Hu B, Liu H, Wang X, Huang X (2020) Human posture recognition based on multiple features and rule learning. Int J Mach Learn Cybern 11:2529–2540

    Article  Google Scholar 

  5. Ding Z, Wang P, Ogunbona PO, Li W (2017) Investigation of different skeleton features for CNN-based 3D action recognition. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 617–622

  6. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118

  7. Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. CoRR. arXiv:1509.09292

  8. Engel JI, Martin J, Barco R (2016) A low-complexity vision-based system for real-time traffic monitoring. IEEE Trans Intell Transp Syst 18(5):1279–1288

    Article  Google Scholar 

  9. Gedamu K, Ji Y, Yang Y, Gao L, Shen HT (2021) Arbitrary-view human action recognition via novel-view action generation. Pattern Recognit 118:108043

    Article  Google Scholar 

  10. Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, pp 1025–1035

  11. Jiang Y, Xu J, Zhang T (2020) View-independent representation with frame interpolation method for skeleton-based human action recognition. Int J Mach Learn Cybern 11(12):2625–2636

    Article  Google Scholar 

  12. Jin D, Liu Z, Li W, He D, Zhang W (2019) Graph convolutional networks meet Markov random fields: semi-supervised community detection in attribute networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 152–159

  13. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297

  14. Kipf T, Fetaya E, Wang K-C, Welling M, Zemel R (2018) Neural relational inference for interacting systems. In: International conference on machine learning. PMLR, pp 2688–2697

  15. Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: Proceedings of the IEEE international conference on computer vision, pp 1012–1020

  16. Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE international conference on multimedia and expo workshops. IEEE, pp 601–604

  17. Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628

    Article  Google Scholar 

  18. Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE international conference on multimedia and expo workshops. IEEE, pp 597–600

  19. Liang D, Fan G, Lin G, Chen W, Pan X, Zhu H (2019) Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (2019)

  20. Liu X, Li Y, Xia R (2020) Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition. Signal Image Video Process 14(6):1227–1234

    Article  Google Scholar 

  21. Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115–5124

  22. Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning. PMLR, pp 2014–2023

  23. Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: International conference on computer analysis of images and patterns. Springer, Berlin, pp 332–339

  24. Ren Z, Zhang Q, Gao X, Hao P, Cheng J (2021) Multi-modality learning for human action recognition. Multimedia Tools Appl 80(11):16185–16203

    Article  Google Scholar 

  25. Shahroudy A, Liu J, Ng T-T, Wang G (2016)  NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

    Google Scholar 

  26. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035

  27. Si C, Jing Y, Wang W, Wang L, Tan T (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recognit 107:107511

    Article  Google Scholar 

  28. Wan Y, Yu Z, Wang Y, Li X (2020) Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8:85284–85293

    Article  Google Scholar 

  29. Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508

  30. Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53

    Article  Google Scholar 

  31. Xu Y, Cheng J, Wang L, Xia H, Liu F, Tao D (2018) Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Process Lett 25(7):1044–1048

    Article  Google Scholar 

  32. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence

  33. Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, pp 3697–3703

Download references

Acknowledgements

This research was supported by the National Key R&D Program of China (no. 2019YFB2101802) and the National Natural Science Foundation of China (no. 61773324).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zeng Yu or Tianrui Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Javed, M.H., Yu, Z., Li, T. et al. Hybrid two-stream dynamic CNN for view adaptive human action recognition using ensemble learning. Int. J. Mach. Learn. & Cyber. 13, 1157–1166 (2022). https://doi.org/10.1007/s13042-021-01441-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01441-2

Keywords

Navigation