Abstract
Human actions are sequential, and structured patterns of the body parts and their movements. In this paper, we present a hybrid two-stream convolutional neural network (H2SCNN) for the recognition of actions from sequences by exploring the statistical information like skeletons. This aims to exploit the skeletons completely and identify the actions properly by merging the different motion related features. These features include motion and joint features. The framework calculates the distance between consecutive sequences to form the temporal information required for the recognition process. The proposed H2SCNN is based on two stages. The neighbourhood feature model will be used to process both inputs individually in the first step. In the second stage, it performs ensemble learning and takes advantage of the diversity of multiple features by fusing them together. The multi-task ensemble learning model helps the system to improve the prediction ability of H2SCNN. Experiments on the benchmark dataset have shown the superiority of the proposed model with other recent approaches.
Similar content being viewed by others
References
Atwood J, Towsley D (2016) Diffusion-convolutional neural networks. In: Advances in neural information processing systems, pp 1993–2001
Caetano C, Sena J, Brémond F, Dos Santos JA, Schwartz WR (2019) Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–8
De Jong M, Joss S, Schraven D, Zhan C, Weijnen M (2015) Sustainable-smart-resilient-low carbon-eco-knowledge cities; making sense of a multitude of concepts promoting sustainable urbanization. J Clean Prod 109:25–38
Ding W, Hu B, Liu H, Wang X, Huang X (2020) Human posture recognition based on multiple features and rule learning. Int J Mach Learn Cybern 11:2529–2540
Ding Z, Wang P, Ogunbona PO, Li W (2017) Investigation of different skeleton features for CNN-based 3D action recognition. In: 2017 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 617–622
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Gómez-Bombarelli R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. CoRR. arXiv:1509.09292
Engel JI, Martin J, Barco R (2016) A low-complexity vision-based system for real-time traffic monitoring. IEEE Trans Intell Transp Syst 18(5):1279–1288
Gedamu K, Ji Y, Yang Y, Gao L, Shen HT (2021) Arbitrary-view human action recognition via novel-view action generation. Pattern Recognit 118:108043
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st international conference on neural information processing systems, pp 1025–1035
Jiang Y, Xu J, Zhang T (2020) View-independent representation with frame interpolation method for skeleton-based human action recognition. Int J Mach Learn Cybern 11(12):2625–2636
Jin D, Liu Z, Li W, He D, Zhang W (2019) Graph convolutional networks meet Markov random fields: semi-supervised community detection in attribute networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 152–159
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Kipf T, Fetaya E, Wang K-C, Welling M, Zemel R (2018) Neural relational inference for interacting systems. In: International conference on machine learning. PMLR, pp 2688–2697
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: Proceedings of the IEEE international conference on computer vision, pp 1012–1020
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: 2017 IEEE international conference on multimedia and expo workshops. IEEE, pp 601–604
Li C, Hou Y, Wang P, Li W (2017) Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Process Lett 24(5):624–628
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: 2017 IEEE international conference on multimedia and expo workshops. IEEE, pp 597–600
Liang D, Fan G, Lin G, Chen W, Pan X, Zhu H (2019) Three-stream convolutional neural network with multi-task and ensemble learning for 3D action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (2019)
Liu X, Li Y, Xia R (2020) Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition. Signal Image Video Process 14(6):1227–1234
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115–5124
Niepert M, Ahmed M, Kutzkov K (2016) Learning convolutional neural networks for graphs. In: International conference on machine learning. PMLR, pp 2014–2023
Nievas EB, Suarez OD, García GB, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: International conference on computer analysis of images and patterns. Springer, Berlin, pp 332–339
Ren Z, Zhang Q, Gao X, Hao P, Cheng J (2021) Multi-modality learning for human action recognition. Multimedia Tools Appl 80(11):16185–16203
Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035
Si C, Jing Y, Wang W, Wang L, Tan T (2020) Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recognit 107:107511
Wan Y, Yu Z, Wang Y, Li X (2020) Action recognition based on two-stream convolutional networks with long-short-term spatiotemporal features. IEEE Access 8:85284–85293
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst 158:43–53
Xu Y, Cheng J, Wang L, Xia H, Liu F, Tao D (2018) Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Process Lett 25(7):1044–1048
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, pp 3697–3703
Acknowledgements
This research was supported by the National Key R&D Program of China (no. 2019YFB2101802) and the National Natural Science Foundation of China (no. 61773324).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Javed, M.H., Yu, Z., Li, T. et al. Hybrid two-stream dynamic CNN for view adaptive human action recognition using ensemble learning. Int. J. Mach. Learn. & Cyber. 13, 1157–1166 (2022). https://doi.org/10.1007/s13042-021-01441-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01441-2