Abstract
This work proposes a method, aiming the 3D skeleton sequence, for the human action recognition by fusing the attention-based three-stream convolutional neural network and support vector machine. The traditional action recognition methods primarily employ RGB video as input. However, RGB video has issues with respect to large data volume, low semanticity, and ease of making the model interfered by irrelevant information such as the background. The efficient and advanced human action information contained in the 3D skeleton sequence facilitates human behavior recognition. First, the information of 3D coordinates, temporal-difference information, and spatial-difference information of joints are extracted from the raw skeleton data, and the above information is input into the respective convolutional neural networks for pre-training. Then, the pre-trained network model extracts the feature containing the spatial-temporal information. Finally, the mixed feature vectors are input into the support vector machine for training and classification. Under the X-View and X-Sub benchmarks, the accuracy on the open dataset NTU RGB+D is 92.6% and 86.7% respectively, demonstrating that the method proposed for incorporating multistream feature learning, feature fusing, and hybrid model can improve the recognition accuracy.
Similar content being viewed by others
Data Availability
The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.
References
Al-Faris M, Chiverton J P, Yang Y, Ndzi D (2020) Multi-view region-adaptive multi-temporal dmm and rgb action recognition. Pattern Anal Appl 23 (4):1587–1602. https://doi.org/10.1007/s10044-020-00886-5
Bhatti U A, Huang M, Wang H, Zhang Y, Mehmood A, Di W (2018) Recommendation system for immunization coverage and monitoring. Human Vacc Immunotherap 14(1):165–171
Bhatti U A, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst 13(3):329–351
Bhatti U A, Ming-Quan Z, Huo Q, Ali S, Hussain A, Yan Y, Yu Z, Yuan L, Nawaz S A (2021) Advanced color edge detection using clifford algebra in satellite images. IEEE Photonics J 13(2)
Bhatti U A, Nizamani M M, Huang M (2022) Climate change threatens Pakistan’s snow leopards. Science 377(6606):585–586. https://doi.org/10.1126/science.add9065
Bhatti U A, Yan Y, Zhou M, Ali S, Hussain A, Huo Q, Yu Z, Yuan L (2021) Time series analysis and forecasting of air pollution particulate matter (pm2.5): an sarima and factor analysis approach. IEEE Access 9:41019–41031
Bhatti U A, Yuan L, Yu Z, Li J, Nawaz S A, Mehmood A, Zhang K (2021) New watermarking algorithm utilizing quaternion fourier transform with advanced scrambling and secure encryption. Multimed Tools Applic 80(9):13367–13387
Bhatti U A, Yu Z, Chanussot J, Zeeshan Z, Yuan L, Luo W, Nawaz S A, Bhatti M A, Ain Q U, Mehmood A (2022) Local similarity-based spatial-spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15
Caetano C, Brémond F, Schwartz W R (2019) Skeleton image representation for 3d action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 16–23
Chen J, Ho C M, Soc I C (2022) Mm-vit: multi-modal video transformer for compressed video action recognition. In: 22nd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE Winter Conference on Applications of Computer Vision, pp 786–797
Dan Y, Jingbing L, Yangxiu F, Wenfeng C, Xiliang X, Bhatti U A, Baoru H (2021) A robust zero-watermarkinging algorithm based on phts-dct for medical images in the encrypted domain. Innovation in Medicine and Healthcare. Proceedings of 9th KES-InMed 2021. Smart Innovation, Systems and Technologies, pp 101–13
Dang L M, Min K, Wang H, Piran M J, Lee C H, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn, 108. https://doi.org/10.1016/j.patcog.2020.107561
Ding W, Ding C, Li G, Liu K (2021) Skeleton-based square grid for human action recognition with 3d convolutional neural network. IEEE Access 9:54078–54089
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1110–1118
Duan H, Zhao Y, Chen K, Lin D, Dai B, Ieee Comp, S O C (2022) Revisiting skeleton-based action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference on Computer Vision and Pattern Recognition, pp 2959–2968. https://doi.org/10.1109/cvpr52688.2022.00298
Feng D, Wu Z, Zhang J, Ren T (2021) Multi-scale spatial temporal graph neural network for skeleton-based action recognition. IEEE Access 9:58256–58265
Feng L, Zhao Y, Zhao W, Tang J (2022) A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif Intell Rev, 4275–4305. https://doi.org/10.1007/s10462-021-10107-y
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Comput Vis Image Underst 158:85–105
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML). PMLR , pp 448–456
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3288–3297
Kennedy-Metz L R, Mascagni P, Torralba A, Dias R D, Perona P, Shah J A, Padoy N, Zenati M A (2020) Computer vision in the operating room: opportunities and caveats. IEEE Trans Med Robot Bion 3(1): 2–10
Koniusz P, Cherian A, Porikli F (2016) Tensor representations via kernel linearization for action recognition from 3d skeletons. In: European Conference on Computer Vision (ECCV). Springer, pp 37–53
Li C, Zhong Q, Xie D, Pu S (2017) Skeleton-based action recognition with convolutional neural networks. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE , pp 597–600
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: International Joint Conference on Artificial Intelligence (IJCAI)
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (indrnn): building a longer and deeper rnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5457–5466
Li M S, Chen S H, Chen X, Zhang Y, Wang Y F, Tian Q, Soc I C (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference on computer vision and pattern recognition, pp 3590–3598
Li T, Li J, Liu J, Huang M, Chen Y-W, Bhatti U A (2022) Robust watermarking algorithm for medical images based on log-polar transform. Eurasip J Wireless Commun Network 2022:1. https://doi.org/10.1186/s13638-022-02106-6
Li Y, Li J, Shao C, Bhatti U A, Ma J (2022) Robust multi-watermarking algorithm for medical images using patchwork-dct. In: 8th International Conference on Artificial Intelligence and Security (ICAIS). Lecture notes in computer science, vol 13340, pp 386–399, DOI https://doi.org/10.1007/978-3-031-06791-4_31
Liang D, Fan G, Lin G, Chen W, Zhu H (2019) Three-stream convolutional neural network with multi-task and ensemble learning for 3d action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Lin Z, Zhang W, Deng X, Ma C, Wang H (2020) Image-based pose representation for action recognition and hand gesture recognition, 532–539
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European Conference on Computer Vision (ECCV). Springer, pp 816–833
Liu A-A, Shao Z, Wong Y, Li J, Su Y-T, Kankanhalli M (2019) Lstm-based multi-label video event detection. Multimed Tools Applic 78 (1):677–695. https://doi.org/10.1007/s11042-017-5532-x
Liu Z Y, Zhang H W, Chen Z H, Wang Z Y, Ouyang W L, Ieee (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference on computer vision and pattern recognition, pp 140–149. https://doi.org/10.1109/cvpr42600.2020.00022
Liu W, Li J, Shao C, Ma J, Huang M, Bhatti U A (2022) Robust zero watermarking algorithm for medical images using local binary pattern and discrete cosine transform. Advances in artificial intelligence and security: 8th international conference on artificial intelligence and security, ICAIS 2022, Proceedings. Communications in computer and information science
Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2022) Action transformer: a self-attention model for short-time pose-based human action recognition. Pattern Recogn, 124
Nguyen V-T, Nguyen T-N, Le T-L, Pham D-T, Vu H (2021) Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition. Multimed Tools Applic 80(18):27757–27783
Pan H, Chen Y (2019) Multilevel lstm for action recognition based on skeleton sequence. In: 2019 IEEE 21st international conference on high performance computing and communications; IEEE 17th International conference on smart city; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 2218–2223
Ruiz A H, Porzi L, Bulo S R, Moreno-Noguer F (2017) 3d cnns on distance matrices for human action recognition, 1087–1095. https://doi.org/10.1145/3123266.3123299
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1010–1019
Shao Z, Han J, Marnerides D, Debattista K (2022) Region-object relation-aware dense captioning via transformer. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/tnnls.2022.3152990
Shao Z, Han J, Debattista K, Pang Y (2023) Textual context-aware dense captioning with diverse words, 1–15. https://doi.org/10.1109/TMM.2023.3241517
Shen X, Ding Y (2022) Human skeleton representation for 3d action recognition based on complex network coding and lstm. J Vis Commun Image Represent 82:103386. https://doi.org/10.1016/j.jvcir.2021.103386
Shen X, Ding Y (2022) Human skeleton representation for 3d action recognition based on complex network coding and lstm. J Vis Commun Image Represent 82:103386. https://doi.org/10.1016/j.jvcir.2021.103386
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7904–7913. https://doi.org/10.1109/CVPR.2019.00810
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 12018–12027. https://doi.org/10.1109/CVPR.2019.01230
Shi L, Zhang Y F, Cheng J, Lu H Q (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/tip.2020.3028207
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European conference on computer vision (ECCV), pp 103–118
Singla M, Ghosh D, Shukla KK (2020) A survey of robust optimization based machine learning with special reference to support vector machines. Int J Mach Learn Cybern 11(7):1359–1385
Su B, Wu H, Sheng M, Shen C (2019) Accurate hierarchical human actions recognition from kinect skeleton data. IEEE Access 7:52532–52541
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 5323–5332. https://doi.org/10.1109/CVPR.2018.00558
Tong A, Tang C, Wang W (2022) Semi-supervised action recognition from temporal augmentation using curriculum learning. IEEE Trans Circuits Syst Video Technol, 1–1. https://doi.org/10.1109/TCSVT.2022.3210271
Vemulapalli R, Chellapa R (2016) Rolling rotations for recognizing human actions from 3d skeletal data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp 4471–4479
Wang L, Huynh D Q, Koniusz P (2020) A comparative review of recent kinect-based action recognition algorithms. IEEE Trans Image Process 29:15–28. https://doi.org/10.1109/TIP.2019.2925285
Woo S, Park J, Lee J-Y, Kweon I S (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Wu H, Ma X, Li Y (2022) Spatiotemporal multimodal learning with 3d cnns for video action recognition. IEEE Trans Circ Syst Video Technol 32 (3):1250–1261. https://doi.org/10.1109/TCSVT.2021.3077512
Xia L, Chen C-C, Aggarwal J K (2012) View invariant human action recognition using histograms of 3d joints. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp 20–27
Xiliang X, Jingbing L, Dan Y, Yangxiu F, Wenfeng C, Bhatti U A, Baoru H (2021) Robust zero watermarking algorithm for encrypted medical images based on dwt-gabor. Innovation in Medicine and Healthcare. Proceedings of 9th KES-InMed 2021. Smart Innovation, Systems and Technologies. https://doi.org/10.1007/978-981-16-3013-2_7
Xu W, Wu M, Zhu J, Zhao M (2021) Multi-scale skeleton adaptive weighted gcn for skeleton-based human action recognition in iot. Appl Soft Comput, 104
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence, pp 7444–7452
Yangxiu F, Jing L, Jingbing L, Dan Y, Wenfeng C, Xiliang X, Baoru H, Bhatti U A (2021) A novel robust watermarking algorithm for encrypted medical image based on Bandelet-DCT. https://doi.org/10.1007/978-981-16-3013-2_6
Yu L, Tian L, Du Q, Bhutto J A (2022) Multi-stream adaptive 3d attention graph convolution network for skeleton-based action recognition. Appl Intell
Yue R, Tian Z, Du S (2022) Action recognition based on rgb and skeleton data sets: a survey. Neurocomputing 512:287–306. https://doi.org/10.1016/j.neucom.2022.09.071
Zeeshan Z, ul Ain Q, Bhatti U A, Memon W H, Ali S, Nawaz S A, Nizamani M M, Mehmood A, Bhatti M A, Shoukat M U (2021) Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation. Intell Data Anal 25(4):1013–1029
Zeng C, Liu J, Li J, Cheng J, Zhou J, Nawaz S A, Xiao X, Bhatti U A (2022) Multi-watermarking algorithm for medical image based on kaze-dct. J Ambient Intell Humaniz Comput, https://doi.org/10.1007/s12652-021-03539-5
Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer lstm networks. IEEE Trans Multimed 20(9):2330–2343. https://doi.org/10.1109/TMM.2018.2802648
Zhang J, Lou Y, Wang J, Wu K, Lu K, Jia X (2021) Evaluating adversarial attacks on driving safety in vision-based autonomous vehicles. IEEE Internet Things J 9(5):3443–3456
Zheng Z, An G, Wu D, Ruan Q (2019) Spatial-temporal pyramid based convolutional neural network for action recognition. Neurocomputing 358:446–455. https://doi.org/10.1016/j.neucom.2019.05.058
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Zhuang Q, Gan S, Zhang L (2022) Human-computer interaction based health diagnostics using resnet34 for tongue image classification. Comput Methods Programs Biomed 226:107096. https://doi.org/10.1016/j.cmpb.2022.107096
Acknowledgements
This work was supported in part by the National Natural Science Foundation (62076154, U21A20513), the Anhui Provincial Natural Science Foundation (2008085MF202), the University Natural Sciences Research Project of Anhui Province (KJ2020A0660), and the Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University (MMC202003), Scientific Research Projects for Graduate Students in Anhui Universities (YJS20210564), Anhui Province Student Innovation Training Project (S202111059266, S202111059016).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ren, F., Tang, C., Tong, A. et al. Skeleton-based human action recognition by fusing attention based three-stream convolutional neural network and SVM. Multimed Tools Appl 83, 6273–6295 (2024). https://doi.org/10.1007/s11042-023-15334-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15334-9