Abstract
Gait recognition has a variety of development potentials, such as noncontact potential. The preference for skeleton-based recognition arises due to challenges posed by self-occlusion and environmental factors affecting silhouette-based methods. Addressing the discriminative properties of long-term and short-term temporal cues, we propose spatiotemporal smoothing aggregation enhanced multiscale residual deep graph convolutional networks. This paper considers both long and short gait feature time series, enabling the learning of discriminative multiscale representations. In the baseline network, three scale features are sequentially extracted, followed by a reverse process to extract and fuse multiscale features. This method significantly bolsters the ability of graph convolution to effectively model the context knowledge of human poses effectively. This study investigated multiscale gait feature aggregation, which significantly mitigates oversmoothing effects. A spatiotemporal smoothing aggregation module with an embedded attention mechanism is introduced to hierarchically aggregate and enhance multiscale key joint features. This module alleviates oversmoothing in deep graph convolutional networks. The method underwent rigorous testing on the Chinese Academy of Sciences Institute of Automation(CASIA-B) dataset, achieving an average accuracy of 78.2%, ranking as the second highest performing skeletal-based gait recognition model currently available, and attaining rank-1 accuracies of 14.7 and 8.19 on Gait Recognition in the wild (GREW) and Gait3D datasets, respectively.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The research in this paper based on three publicly available datasets that are CASIA-B Gait Database(http://www.cbsr.ia.ac.cn/english/Gait%20Databases.asp), Gait Recognition Evaluation Workshop (https://www.grew-benchmark.org/) and Gait3D Dataset (https://gait3d.github.io/). Requires permission to use from the data owner.
Code availability
Due to legal considerations, we are unable to open-source the code for this study at this moment. Your understanding and support are greatly appreciated.
References
Li N, Zhao X (2023) A multi-modal dataset for gait recognition under occlusion. Appl Intell 53(2):1517–1534
Li G, Guo L, Zhang R et al (2023) Transgait: Multimodal-based gait recognition with set transformer. Appl Intell 53(2):1535–1547
Ben X, Gong C, Zhang P et al (2019) Coupled bilinear discriminant projection for cross-view gait recognition. IEEE Trans Circuits Syst Video Technol 30(3):734–747
Chao H, He Y, Zhang J, et al (2019) Gaitset: Regarding gait as a set for cross-view gait recognition. In: Proceedings of the AAAI conference on artificial intelligence, pp 8126–8133
Dang L, Nie Y, Long C, et al (2021) Msrgcn: Multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 11467–11476
Huang X, Zhu D, Wang H, et al (2021) Context-sensitive temporal feature learning for gait recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 12909–12918
Fan C, Peng Y, Cao C, et al (2020) Gaitpart: Temporal part-based model for gait recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14225–14233
Yang Y, Yang X, Sakamoto T et al (2022) Unsupervised domain adaptation for disguised-gait-based person identification on micro-doppler signatures. IEEE Trans Circuits Syst Video Technol 32(9):6448–6460
Xing Y, Zhu J, Li Y et al (2023) An improved spatial temporal graph convolutional network for robust skeleton-based action recognition. Appl Intell 53(4):4592–4608
Yu L, Tian L, Du Q et al (2023) Multi-stream adaptive 3d attention graph convolution network for skeleton-based action recognition. Appl Intell 53(12):14838–14854
Yang W, Zhang J, Cai J et al (2023) Hybridnet: Integrating gcn and cnn for skeleton-based action recognition. Appl Intell 53(1):574–585
Sun K, Xiao B, Liu D, et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 5693–5703
Gianaria E, Balossino N, Grangetto M, et al (2013) Gait characterization using dynamic skeleton acquisition. In: 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP). IEEE, pp 440–445
Cao Z, Simon T, Wei SE, et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7291–7299
Fang HS, Xie S, Tai YW, et al (2017) Rmpe: Regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision. pp 2334–2343
Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 17–30
Zheng J, Liu X, Liu W, et al (2022) Gait recognition in the wild with dense 3d representations and a benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 20228–20237
Liao R, Yu S, An W et al (2020) A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn 98:107069
Teepe T, Khan A, Gilg J, et al (2021) Gaitgraph: Graph convolutional network for skeleton-based gait recognition. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2314–2318
Cosma A, Radoi IE (2021) Wildgait: Learning gait representations from raw surveillance streams. Sensors 21(24):8387
Pinyoanuntapong E, Ali A, Wang P, et al (2023) Gaitmixer: skeleton-based gait representation learning via wide-spectrum multi-axial mixer. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1–5
Hua G, Long C, Yang M, et al (2013) Collaborative active learning of a kernel machine ensemble for recognition. In: Proceedings of the IEEE international conference on computer vision. pp 1209–1216
Hu T, Long C, Xiao C (2021) A novel visual representation on text using diverse conditional gan for visual recognition. IEEE Trans Image Process 30:3499–3512
Long C, Hua G (2015) Multi-class multi-annotator active learning with robust gaussian process for visual recognition. In: Proceedings of the IEEE international conference on computer vision. pp 2839–2847
Wang Y, Kitani K, Weng X (2021) Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 13708–13715
Zeng R, Huang W, Tan M, et al (2019) Graph convolutional networks for temporal action localization. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 7094–7103
Islam A, Long C, Radke R (2021) A hybrid attention mechanism for weakly-supervised temporal action localization. In: Proceedings of the AAAI conference on artificial intelligence. pp 1637–1645
Shi L, Wang L, Long C, et al (2021) Sgcn: Sparse graph convolution network for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 8994–9003
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence
Shi L, Zhang Y, Cheng J, et al (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035
Teepe T, Gilg J, Herzog F, et al (2022) Towards a deeper understanding of skeleton-based gait recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 1569–1577
Liao R, Cao C, Garcia EB, et al (2017) Pose-based temporal-spatial network (ptsn) for gait recognition with carrying and clothing variations. In: Biometric Recognition: 12th Chinese Conference, CCBR 2017, Shenzhen, China, October 28-29, 2017, Proceedings 12. Springer, pp 474–483
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Sokolova A, Konushin A (2019) Pose-based deep gait recognition. IET. Biometrics 8(2):134–143
Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In: 18th international conference on pattern recognition (ICPR’06). IEEE, pp 441–444
Liu X, You Z, He Y et al (2022) Symmetry-driven hyper feature gcn for skeleton-based gait recognition. Pattern Recogn 125:108520
Tian H, Ma X, Wu H et al (2022) Skeleton-based abnormal gait recognition with spatio-temporal attention enhanced gait-structural graph convolutional networks. Neurocomputing 473:116–126
Liao R, Li Z, Bhattacharyya SS et al (2022) Posemapgait: A model-based gait recognition method with pose estimation maps and graph convolutional networks. Neurocomputing 501:514–528
Li Q, Han Z, Wu XM (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI conference on artificial intelligence
Mao W, Liu M, Salzmann M, et al (2019) Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9489–9497
Song YF, Zhang Z, Shan C, et al (2020) Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In: proceedings of the 28th ACM international conference on multimedia. pp 1625–1633
Khosla P, Teterwak P, Wang C et al (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Cheng K, Zhang Y, Cao C, et al (2020) Decoupling gcn with dropgraph module for skeleton-based action recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, pp 536–553
Song C, Huang Y, Huang Y et al (2019) Gaitnet: An end-to-end network for gait based human identification. Pattern Recogn 96:106988
Wu Z, Huang Y, Wang L et al (2016) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(2):209–226
Han J, Bhanu B (2005) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322
Wang C, Zhang J, Pu J, et al (2010) Chrono-gait image: A novel temporal template for gait recognition. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part I 11. Springer, pp 257–270
Zhang Y, Huang Y, Yu S et al (2019) Cross-view gait recognition by discriminative feature learning. IEEE Trans Image Process 29:1001–1015
Xu C, Makihara Y, Li X et al (2020) Cross-view gait recognition using pairwise spatial transformer networks. IEEE Trans Circuits Syst Video Technol 31(1):260–274
Takemura N, Makihara Y, Muramatsu D et al (2017) On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans Circuits Syst Video Technol 29(9):2708–2719
Lin B, Zhang S, Bao F (2020) Gait recognition with multiple-temporal-scale 3d convolutional neural network. In: Proceedings of the 28th ACM international conference on multimedia. pp 3054–3062
Si C, Chen W, Wang W, et al (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1227–1236
Li N, Zhao X, Ma C (2020) Jointsgait: A model-based gait recognition method based on gait graph convolutional networks and joints relationship pyramid mapping. arXiv:2005.08625
Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi-domain operations applications, SPIE, pp 369–386
Zhu Z, Guo X, Yang T, et al (2021) Gait recognition in the wild: A benchmark. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14789–14799
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Yu S, Chen H, Garcia Reyes EB, et al (2017) Gaitgan: Invariant gait feature extraction using generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–37
Yu S, Liao R, An W et al (2019) Gaitganv 2: Invariant gait feature extraction using generative adversarial networks. Pattern Recogn 87:179–189
He Y, Zhang J, Shan H et al (2018) Multi-task gans for view-specific feature learning in gait recognition. IEEE Trans Inf Forensics Secur 14(1):102–113
Shiraga K, Makihara Y, Muramatsu D, et al (2016) Geinet: View-invariant gait recognition using a convolutional neural network. In: 2016 international conference on biometrics (ICB). IEEE, pp 1–8
Wu Z, Huang Y, Wang L et al (2016) A comprehensive study on cross-view gait based human identification with deep cnns. IEEE Trans Pattern Anal Mach Intell 39(2):209–226
Hou S, Cao C, Liu X, et al (2020) Gait lateral network: Learning discriminative and compact representations for gait recognition. In: European conference on computer vision, Springer, pp 382–398
Lin B, Zhang S, Yu X (2021) Gait recognition via effective global-local feature representation and local temporal aggregation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 14648–14656
Wu Y, Wang Y, Li Y et al (2022) Top-k self-adaptive contrast sequential pattern mining. IEEE transactions on cybernetics 52(11):11819–11833
Funding
This study was funded by Shenzhen Startup Funding (No. QD2023014C) and National Natural Science Fundation of China (No. 61906074).
Author information
Authors and Affiliations
Contributions
Guanghai Chen developed and implemented the algorithms and models used in the study and wrote the first draft. Xin Chen was responsible for the project design and overall plan of the study. Chengzhi Zheng was responsible for data organization. Junshu Wang was responsible for reviewing the paper manuscript and suggesting important changes to the paper content. Xinchao Liu was responsible for the visualization and graphing of the experimental results. Yuxing Han is the corresponding author of this study. First Author and Second Author contribute equally to this work and should be considered co-first authors.
Corresponding author
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Ethical and informed consent for data used
Not applicable. The work in this paper has no ethical or moral implications such as human or animal experimentation. The work presented in this article is entirely original and has not been published in any other journals. This journal is the premiere and exclusive contributing journal for the paper. There are no violations of academic ethics. The right to use the data used in this study has been approved by the owner.
Consent to participate
All participants in this study were informed and consented to participate in the study.
Consent for publication
Participants in this study gave their consent for the results to be used for publication, presentation, or sharing.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, G., Chen, X., Zheng, C. et al. Spatiotemporal smoothing aggregation enhanced multi-scale residual deep graph convolutional networks for skeleton-based gait recognition. Appl Intell 54, 6154–6174 (2024). https://doi.org/10.1007/s10489-024-05422-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05422-0