Spatio-temporal stacking model for skeleton-based action recognition

Zhong, Yufeng; Yan, Qiuyan

doi:10.1007/s10489-021-02994-z

Spatio-temporal stacking model for skeleton-based action recognition

Published: 01 February 2022

Volume 52, pages 12116–12130, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

629 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Due to the prevalence of affordable depth sensors, skeleton-based action recognition has attracted much attention as a significant computer vision task. The state-of-the-art recognition precision usually comes from the complicated deep learning networks which need a large quantity of training data. On the counterparts, none-deep learning methods are easy to be trained and understood, however, have restricted expressive ability to extract the spatial-temporal features of skeleton data simultaneously. Therefore, it is a challenging problem to use shallow learning architecture to effectively identify complex actions in skeleton data. In this paper, we first combine Temporal Hierarchy Pyramid (THP) and Symmetric Positive Definite (SPD) features to simultaneously capture the temporal relationship of inter-frame and the spatial relationship of intra-frame. Then, to achieve the same learning ability as the deep learning network for a non-linear system, we propose a novel stacking ensemble-based method to effectively identify complex actions in skeleton data. We carry out extensive verification of our method on widely used 3D action recognition datasets. The experiment results indicate that we achieve state-of-the-art performance on all compared datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies

Article 15 May 2022

Deep Stacked Bidirectional LSTM Neural Network for Skeleton-Based Action Recognition

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

References

Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (smij) A new representation for human skeletal action recognition. J Vis Commun Image Represent 25 (1):24–38
Article Google Scholar
Ding W, Liu K, Cheng F, Shi H, Zhang B (2015) Skeleton-based human action recognition with profile hidden markov models. In: CCF Chinese conference on computer vision. Springer , pp 12–21
Halim AA, Dartigues-Pallez C, Precioso F, Riveill M, Benslimane A, Ghoneim S (2016) Human action recognition based on 3d skeleton part-based pose estimation and temporal multi-resolution analysis. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 3041–3045
Wang P, Yuan C, Hu W, Li B, Zhang Y (2016) Graph based skeleton motion representation and similarity measurement for action recognition. In: European conference on computer vision. Springer, pp 370–385
Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 20–27
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 1290–1297
Zhu Y, Chen W, Guo G (2013) Fusing spatiotemporal features and joints for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 486–491
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In: Twenty-third international joint conference on artificial intelligence
Wang L, Zhang J, Zhou L, Tang C, Li W (2015) Beyond covariance: Feature representation with nonlinear kernel matrices. In: Proceedings of the IEEE international conference on computer vision, pp 4570–4578
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Hou Y, Li Z, Wang P, Li W (2018) Skeleton optical spectra-based action recognition using convolutional neural networks. IEEE Trans Circ Syst Video Technol 28(3):807–811
Article Google Scholar
Li S, Li W, Cook C, Ce Z, Gao Y (2018) Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466
Li C, Cui Z, Zheng W, Xu C, Yang J (2018) Spatio-temporal graph convolution for skeleton based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, pp 3482–3489
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet Google Scholar
Lv F, Nevatia R (2006) Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In: European conference on computer vision. Springer, pp 359–372
Bloom V, Makris D, Argyriou V (2012) G3D: A gaming action dataset and real time action recognition evaluation framework. In: IEEE computer society conference on computer vision and pattern recognition workshops
Bloom V, Makris D, Argyriou V (2014) Clustered spatio-temporal manifolds for online action recognition. In: 2014 22nd international conference on pattern recognition. IEEE, pp 3963–3968
Bloom V, Argyriou V, Makris D (2013) Dynamic feature selection for online action recognition. In: International workshop on human behavior understanding. Springer, pp 64–76
Islam MS, Bakhat K, Khan R, Iqbal M, Ye Z (2021) Action recognition using interrelationships of 3d joints and frames based on angle sine relation and distance features using interrelationships. Appl Intell: 1–13
Oreifej O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE conference on computer vision and pattern recognition, pp 716–723
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2015) 3d human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans Cybern 45 (7):1340–1352
Article Google Scholar
Nie S, Wang Z, Ji Q (2015) A generative restricted boltzmann machine based method for high-dimensional motion data modeling. Comput Vis Image Underst 136:14–22
Article Google Scholar
Zhu W, Lan C, Xing J, Zeng W, Li Y, Shen L, Xie X (2016) Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+d: A large scale dataset for 3d human activity analysis. pp 1010–1019
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055
Wang P, Li W, Li C, Hou Y (2018) Action recognition based on joint trajectory maps with convolutional neural networks. Knowl Based Syst
Huang Z, Van Gool L (2017) A riemannian network for spd matrix learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Huang Z, Wan C, Probst T, Van Gool L (2017) Deep learning on lie groups for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6099–6108
Yang Y, Deng C, Tao D, Zhang S, Liu W, Gao X (2016) Latent max-margin multitask learning with skelets for 3-d action recognition. IEEE Trans Cybernet 47(2):439–448
Google Scholar
Yang Y, Deng C, Gao S, Liu W, Tao D, Gao X (2016) Discriminative multi-instance multitask learning for 3d action recognition. IEEE Trans Multimed 19(3):519–529
Article Google Scholar
Zhao R, Xu W, Su H, Ji Q (2019) Bayesian hierarchical dynamic model for human action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7733–7742
Memmesheimer R, Theisen N, Paulus D (2020) Gimme’ signals: Discriminative signal encoding for multimodal activity recognition. arXiv, pp 10394–10401
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 1159– 1168
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Gao X, Hu W, Tang J, Liu J, Guo Z (2019) Optimized skeleton-based action recognition via sparsified graph regression. In: The 27th ACM international conference
Ding Y, Zhu Y, Wu Y, Jun F, Cheng Z (2019) Spatio-Temporal attention lstm model for flood forecasting. In: Proceedings - 2019 IEEE International Congress on Cybermatics: 12th IEEE International Conference on Internet of Things, 15th IEEE International Conference on Green Computing and Communications, 12th IEEE International Conference on Cyber, Physical and So, pp 458–465
Islam MM, Iqbal T (2020) HAMLET: A hierarchical multimodal attention-based human activity recognition algorithm
Ding C, Liu K, Cheng F, Belyaev E (2021) Spatio-temporal attention on manifold space for 3D human action recognition. Appl Intell 51(1):560–570
Article Google Scholar
Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271–289
Article Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3d points. In: 2010 IEEE computer society conference on computer vision and pattern recognition-workshops. IEEE, pp 9–14
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP). IEEE, pp 168–172
Müller M, Röder T, Clausen M, Eberhardt B, Krüger B, Weber A (2007) Documentation mocap database hdm05
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Yang Y, Deng C, Tao D, Zhang S, Liu W, Gao X (2017) Latent max-margin multitask learning with skelets for 3D action recognition. IEEE Trans Cybern 47(2):439– 448
Google Scholar
Song Y-F, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1–5
Ding Y, Zhu Y, Wu Y, Jun F, Cheng Z (2019) Spatio-temporal attention lstm model for flood forecasting. In: 2019 international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE Cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData). IEEE, pp 458–465
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton
Google Scholar
Iman RL, Davenport JM (1980) Approximations of the critical region of the fbietkan statistic. Commun Stat-Theory Methods 9(6):571–595
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61977061, 51934007, and 61876186) and the National Innovation and Entrepreneurship Training Program for Undergraduate(No.201910290055Z).

Author information

Authors and Affiliations

School of Computer Science Technology, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China
Yufeng Zhong & Qiuyan Yan
Research Center of Innovation on Intelligent Prevention of Disaster and Emergency Rescure, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, China
Qiuyan Yan

Authors

Yufeng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyan Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiuyan Yan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhong, Y., Yan, Q. Spatio-temporal stacking model for skeleton-based action recognition. Appl Intell 52, 12116–12130 (2022). https://doi.org/10.1007/s10489-021-02994-z

Download citation

Accepted: 26 October 2021
Published: 01 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-021-02994-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatio-temporal stacking model for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies

Deep Stacked Bidirectional LSTM Neural Network for Skeleton-Based Action Recognition

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatio-temporal stacking model for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies

Deep Stacked Bidirectional LSTM Neural Network for Skeleton-Based Action Recognition

Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation