Abstract
With the development of depth sensors and pose estimation algorithms, human skeleton action recognition based on graph convolutional networks has acquired widespread attention and application. The latest methods achieve dynamically learning different topologies for modeling and use first-order, second-order, and third-order features, i.e., joint, bone, and motion representations, which has led to high accuracy. However, many models are still confused by actions that have similar motion trajectories, and most of the existing methods model the spatial dimension before the temporal dimension, whereas in fact, spatial and temporal information should be interrelated. In this paper, we propose an efficient graph convolutional network based on multi-order feature information (MFGCN) for human skeleton action recognition. Firstly, our method introduces angle features (noted as fourth-order features), which are implicitly embedded in other third-order features by encoding angular features, to powerfully capture detailed features in the spatio-temporal dimension and enhance the ability to distinguish similar actions. Secondly, we use a content-adaptive approach to construct the adjacency matrix and dynamically learn the topology between the skeleton joints. Finally, we develop a spatio-temporal information sliding extraction module (STISE) to improve the inter-correlation of spatial and temporal information. The proposed method has extensively experimented on the NTU-RGB D, NTU-RGB D 120, and Northwestern-UCLA datasets, and the experimental results show that our method can achieve superior performance compared to the current state-of-the-art methods.
Similar content being viewed by others
Data availability
The datasets are available from the ROSE Lab at https://rose1.ntu.edu.sg/dataset/actionRecognition/;
Code availability
The codes are available from the first author on reasonable request.
References
Setiawan F, Yahya BN, Chun S et al (2022) Sequential inter-hop graph convolution neural network (sihgcn) for skeleton-based human action recognition. Expert Syst Appl 195:116566
Ding C, Wen S, Ding W et al (2022) Temporal segment graph convolutional networks for skeleton-based action recognition. Eng Appl Artif Intell 110:104675
Chen J, Li S, Liu D, Lu W (2022) Indoor camera pose estimation via style-transfer 3d models. Comput Aided Civ Infrastruct Eng 37(3):335–353
Ke L, Chang M, Qi H, Lyu S (2022) Detposenet: Improving multi-person pose estimation via coarse-pose filtering. IEEE Trans Image Process 31:2782–2795
Rao H, Xu S, Hu X et al (2021) Augmented skeleton based contrastive action learning with momentum lstm for unsupervised action recognition. Inf Sci 569:90–109
Banerjee A, Singh PK, Sarkar R (2020) Fuzzy integral-based cnn classifier fusion for 3d skeleton action recognition. IEEE Trans Circ Syst Video Technol 31(6):2206–2216
Huynh-The T, Hua C-H, Ngo T-T et al (2020) Image representation of pose-transition feature for 3d skeleton-based action recognition. Inf Sci 513:112–126
Chen Y, Zhang Z, Yuan C et al (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 13359–13368
Song Y, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3157033
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence, pp 7444–7452
Silva V, Soares F, Leão CP et al (2021) Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network. Sensors 21(13):4342
Cheng K, Zhang Y, Cao C et al (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: European Conference on Computer Vision, pp 536–553
Huang L, Huang Y, Ouyang W, Wang L (2019) Hierarchical graph convolutional network for skeleton-based action recognition. In: International Conference on Image and Graphics, pp 93–102
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12026–12035
Yang H, Yan D, Zhang L et al (2021) Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans Image Process 31:164–175
Qin Z, Liu Y, Ji P, et al (2021) Fusing higher-order features in graph neural networks for skeleton-based action recognition. arXiv preprint arXiv:2105.01563
Chen J, Wang Z, Zeng K et al (2022) Rethinking lightweight: multiple angle strategy for efficient video action recognition. IEEE Signal Process Lett 29:498–502
Islam MS, Bakhat K, Khan R et al (2021) Action recognition using interrelationships of 3d joints and frames based on angle sine relation and distance features using interrelationships. Appl Intell 51:6001–6013
Zhang P, Lan C, Zeng W et al (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1112–1121
Huynh-The T, Hua C-H, Tu NA et al (2018) Hierarchical topic modeling with pose-transition feature for action recognition using 3d skeleton data. Inf Sci 444:20–35
Li M, Chen S, Chen X et al (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603
Cassinis LP, Fonod R, Gill E (2019) Review of the robustness and applicability of monocular pose estimation systems for relative navigation with an uncooperative spacecraft. Progress Aerosp Sci 110:100548
Du G, Wang K, Lian S, Zhao K (2021) Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review. Artif Intell Rev 54(3):1677–1734
Liu Z, Zhang H, Chen Z et al (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 143–152
Xie J, Miao Q, Liu R et al (2021) Attention adjacency matrix based graph convolutional networks for skeleton-based action recognition. Neurocomputing 440:230–239
Li Y, Xia R, Liu X (2020) Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recogn 103:107293
Cheng K, Zhang Y, He X et al (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 183–192
Li L, Wang M, Ni B et al (2021) 3d human action representation learning via cross-view consistency pursuit. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4741–4750
Hao X, Li J, Guo Y et al (2021) Hypergraph neural network for skeleton-based action recognition. IEEE Trans Image Process 30:2263–2275
Jiang X, Xu K, Sun T (2019) Action recognition scheme based on skeleton representation with ds-lstm network. IEEE Trans Circ Syst Video Technol 30(7):2129–2140
Kong J, Bian Y, Jiang M (2022) Mtt: Multi-scale temporal transformer for skeleton-based action recognition. IEEE Signal Process Lett 29:528–532
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vis Image Underst 208:103219
Liu Y, Zhang H, Xu D, He K (2022) Graph transformer network with temporal kernel attention for skeleton-based action recognition. Knowl Based Syst 240:108146
He T, Zhang Z, Zhang H et al (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 558–567
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1010–1019
Liu J, Shahroudy A, Perez M et al (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701
Wang J, Nie X, Xia Y et al (2014) Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2649–2656
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Ye F, Pu S, Zhong Q et al (2020) Dynamic gcn: Context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 55–63
Si C, Chen W, Wang W et al (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1227–1236
Zhang P, Lan C, Xing J et al (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978
Peng W, Hong X, Chen H, Zhao G (2020) Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 2669–2676
Huang L, Huang Y, Ouyang W, Wang L (2020) Part-level graph convolutional network for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 11045–11052
Korban M, Li X (2020) Ddgcn: A dynamic directed graph convolutional network for action recognition. In: European Conference on Computer Vision, pp 761–776
Peng W, Hong X, Zhao G (2021) Tripool: Graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn 115:107921
Si C, Jing Y, Wang W et al (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 103–118
Song Y-F, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circ Syst Video Technol 31(5):1915–1925
Ding C, Liu K, Korhonen J, Belyaev E (2021) Spatio-temporal difference descriptor for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp 1227–1235
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant 62267007, Gansu Provincial Department of Education Industrial Support Plan Project under Grant 2022CYZC-16.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict and competing interests to declare that are relevant to the content of this article.
Consent to participate
Not applicable.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qi, Y., Hu, J., Han, X. et al. MFGCN: an efficient graph convolutional network based on multi-order feature information for human skeleton action recognition. Neural Comput & Applic 35, 19979–19995 (2023). https://doi.org/10.1007/s00521-023-08814-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08814-4