Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition

You, Wei; Wang, Xue; Zhang, Weihang; Qiang, Zhenfeng

doi:10.1007/s11042-022-11919-y

Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition

Published: 12 February 2022

Volume 81, pages 9685–9711, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wei You¹,
Xue Wang ORCID: orcid.org/0000-0003-4842-3160¹,
Weihang Zhang¹ &
…
Zhenfeng Qiang¹

322 Accesses
1 Altmetric
Explore all metrics

Abstract

The 3D human body skeleton conveys rich information of human action and is regarded as an important data modality for action recognition. Due to the diversity of human action and the noise in skeleton data, skeleton-based action recognition methods face the challenges of overcoming the interference of irrelevant data and learning enough valid information of human action. Previous research has led us to a variety of effective skeleton features and many deep network models with strong learning abilities. However, a single model using a single feature cannot make full use of the valid information in the skeleton. To address this problem, this paper proposes the multi-level kinematic constraints to construct multiple skeleton features. By using different levels of constraints, a set of features containing information from local to global are extracted. The variability among these features leads to significant variability in classifiers trained on them, thus enhancing the ensemble performance of these classifiers. Extensive experiments on three representative datasets and four kinds of classification models demonstrate the generality of the proposed method. A substantial improvement can be achieved on multiple kinds of existing well-performing models and our method surpasses most state-of-the-art skeleton-based action recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient combination of classifiers for 3D action recognition

Article 14 March 2021

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Article 08 February 2022

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Article 25 May 2021

Data Availability

All data generated or analysed during this study are included in this published article.

Code Availability

Not applicable.

Notes

FLoating-number OPerations

References

Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recognit Lett 48:70–80. https://doi.org/10.1016/JPATREC201404011
Article Google Scholar
Avola D, Cascio M, Cinque L, Foresti GL, Massaroni C, Rodola E (2020) 2-D skeleton-based action recognition via two-branch stacked LSTM-RNNs. IEEE Trans Multimed 22(10):2481–2496. https://doi.org/10.1109/TMM20192960588
Article Google Scholar
Bian C, Feng W, Wan L, Wang S (2021) Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans Image Process 30:2963–2976. https://doi.org/10.1109/TIP.2021.3056895
Article Google Scholar
Chen Y, Wang L, Li C, Hou Y, Li W (2020) ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79 (3–4):1707–1725. https://doi.org/10.1007/s11042-019-08261-1
Article Google Scholar
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proc conf comput vision pattern recognit, pp 183–192. https://doi.org/10.1109/CVPR42600202000026
Ding W, Liu K, Belyaev E, Cheng F (2018) Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit 77:75–86. https://doi.org/10.1016/JPATCOG201712004
Article Google Scholar
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 3rd IAPR Asian conference on pattern recognition, pp 579–583
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proc conf comput vision pattern recognit, pp 1110–1118. https://doi.org/10.1109/CVPR20157298714
Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: 22nd International conference on pattern recognition, pp 4513–4518. https://doi.org/10.1109/ICPR2014772
Fan Z, Zhao X, Lin T, Su H (2019) Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans Multimed 21 (2):363–374. https://doi.org/10.1109/TMM20182859620
Article Google Scholar
Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787. https://doi.org/10.1109/TPAMI20162558148
Article Google Scholar
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5, SI):1318–1334. https://doi.org/10.1109/TCYB20132265378
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proc Int Conf Comput Vision, pp. 1026–1034. https://doi.org/10.1109/ICCV2015123
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc Conf Comput Vision Pattern Recognit, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu G, Cui B, Yu S (2020) Joint learning in the spatio-temporal and frequency domains for skeleton-based action recognition. IEEE Trans Multimed 22 (9):2207–2220. https://doi.org/10.1109/TMM20192953325
Article Google Scholar
Hu J, Zheng W, Lai J, Zhang J (2017) Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2186–2200. https://doi.org/10.1109/TPAMI20162640292
Article Google Scholar
Ji X, Cheng J, Feng W, Tao D (2018) Skeleton embedded motion body partition for human action recognition using depth sequences. Singal Processing 143:56–68. https://doi.org/10.1016/JSIGPRO201708016
Article Google Scholar
Johansson G (1973) Visual-perception of biological motion and a model for its analysis. Percep Psychophys 14(2):201–211. https://doi.org/10.3758/BF03212378
Article Google Scholar
Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process Lett 24(6):731–735. https://doi.org/10.1109/LSP20172690339
Article Google Scholar
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: Proc conf comput vision pattern recognit, pp 4570–4579. https://doi.org/10.1109/CVPR2017486
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3D action recognition. IEEE Trans Image Process 27(6):2842–2855. https://doi.org/10.1109/TIP20182812099
Article MathSciNet Google Scholar
Keselman L, Woodfill JI, Grunnet-Jepsen A, Bhowmik A (2017) Intel (R) realsense (TM) stereoscopic depth cameras. In: Proc conf comput vision pattern recognit workshops, pp 1267–1276. https://doi.org/10.1109/CVPRW2017167
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. In: Proc conf comput vision pattern recognit workshops, pp 1623–1631. https://doi.org/10.1109/CVPRW2017207
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proc conf comput vision pattern recognit, pp 1003–1012. https://doi.org/10.1109/CVPR2017113
Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: Proc Int Conf Comput Vision, pp 1012–1020. https://doi.org/10.1109/ICCV2017115
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proc conf comput vision pattern recognit, pp 3590–3598. https://doi.org/10.1109/CVPR201900371
Li Y, Xia R, Liu X (2020) Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recognit, 103. https://doi.org/10.1016/j.patcog.2020.107293
Liu J, Shahroudy A, Perez M, Wang G, Duan L, Kot AC (2020) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/TPAMI20192916873
Article Google Scholar
Liu J, Shahroudy A, Wang G, Duan L, Kot AC (2020) Skeleton-based online action prediction using scale selection network. IEEE Trans Pattern Anal Mach Intell 42(6):1453–1467. https://doi.org/10.1109/TPAMI20192898954
Article Google Scholar
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Proc Euro conf comput vision, pp 816–833. https://doi.org/10.1007/978-3-319-46487-9_50
Liu J, Wang G, Duan L, Abdiyeva K, Kot AC (2018) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP20172785279
Article MathSciNet Google Scholar
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362. https://doi.org/10.1016/JPATCOG201702030
Article Google Scholar
Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proc conf comput vision pattern recognit, pp 1159–1168. https://doi.org/10.1109/CVPR201800127
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 143–152. https://doi.org/10.1109/CVPR42600202000022
Nie Q, Wang J, Wang X, Liu Y (2019) View-invariant human action recognition based on a 3D bio-constrained skeleton model. IEEE Trans Image Process 28(8):3959–3972. https://doi.org/10.1109/TIP20192907048
Article MathSciNet Google Scholar
Pakrashi A, Mac Namee B (2019) Kalman filter-based heuristic ensemble (kfhe): a new perspective on multi-class ensemble classification using kalman filters. Inform Sci 485:456–485. https://doi.org/10.1016/j.ins.2019.02.017
Article Google Scholar
Peddinti V, Wang Y, Povey D, Khudanpur S (2018) Low latency acoustic modeling using temporal convolution and LSTMs. IEEE Signal Process Lett 25(3):373–377. https://doi.org/10.1109/LSP20172723507
Article Google Scholar
Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proc Conf comput vision pattern recognit, pp 1010–1019. https://doi.org/10.1109/CVPR2016115
Shahroudy A, Ng TT, Yang Q, Wang G (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129. https://doi.org/10.1109/TPAMI20152505295
Article Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 12018–12027. https://doi.org/10.1109/CVPR201901230
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/TIP.2020.3028207
Article Google Scholar
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 1227–1236. https://doi.org/10.1109/CVPR201900132
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proc Conf AAAI artif intell. https://aaai.org/ocs/indexphp/AAAI/AAAI17/paper/view/14437, pp 4263–4270
Sun B, Kong D, Wang S, Wang L, Wang Y, Yin B (2019) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl 78(5):6329–6353. https://doi.org/10.1007/s11042-018-6370-1
Article Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: Proc Conf comput vision pattern recognit, pp 588–595. https://doi.org/10.1109/CVPR201482
Wang H, Wang L (2018) Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit 81:23–35. https://doi.org/10.1016/JPATCOG201803030
Article Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36 (5):914–927. https://doi.org/10.1109/TPAMI2013198
Article Google Scholar
Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014) Cross-view action modeling, learning and recognition. In: Proc Conf comput vision pattern recognit, pp 2649–2656. https://doi.org/10.1109/CVPR2014339
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM Int conf multimedia, pp 97–106. https://doi.org/10.1145/29642842967191
Wei P, Sun H, Zheng N (2019) Learning composite latent structures for 3D human action representation and recognition. IEEE Trans Multimed 21 (9):2195–2208. https://doi.org/10.1109/TMM20192897902
Article Google Scholar
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc Conf comput vision pattern recognit workshops, pp 20–27. https://doi.org/10.1109/CVPRW20126239233
Xu Y, Cheng J, Wang L, Xia H, Liu F, Tao D (2018) Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Process Lett 25(7):1044–1048. https://doi.org/10.1109/LSP20182841649
Article Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proc Conf AAAI artif intell. https://aaai.org/ocs/indexphp/AAAI/AAAI18/paper/view/17135, pp 7444–7452
Yang J, Liu W, Yuan J, Mei T (2021) Hierarchical soft quantization for skeleton-based human action recognition. IEEE Trans Multimed 23:883–898. https://doi.org/10.1109/TMM.2020.2990082
Article Google Scholar
Yang X, Tian Y (2014) Effective 3D action recognition using EigenJoints. J Vis Commun Image Represent 25(1, SI):2–11. https://doi.org/10.1016/JJVCIR201303001
Article Google Scholar
Yang Y, Deng C, Gao S, Liu W, Tao D, Gao X (2017) Discriminative multi-instance multitask learning for 3D action recognition. IEEE Trans Multimed 19(3):519–529. https://doi.org/10.1109/TMM20162626959
Article Google Scholar
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/TPAMI20192896631
Article Google Scholar
Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proc Conf comput vision pattern recognit, pp 1112–1121. https://doi.org/10.1109/CVPR42600.2020.00119
Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimed 20(9):2330–2343. https://doi.org/10.1109/TMM20182802648
Article Google Scholar
Zhang X, Xu C, Tao D (2020) Context aware graph convolution for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 14333–14342. https://doi.org/10.1109/CVPR42600202001434

Download references

Acknowledgements

This work was supported by the Nation Key Research and Development Program of China under Grant 2018YFB2003500.

Funding

This work was supported by the Nation Key Research and Development Program of China under Grant 2018YFB2003500.

Author information

Authors and Affiliations

Department of Precision Instrument, Tsinghua University, Beijing, 100084, China
Wei You, Xue Wang, Weihang Zhang & Zhenfeng Qiang

Authors

Wei You
View author publications
You can also search for this author in PubMed Google Scholar
Xue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weihang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenfeng Qiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors of this research paper have directly participated in the planning, execution, or analysis of the study.

Corresponding author

Correspondence to Xue Wang.

Ethics declarations

Ethics approval

Not applicable.

Consent for Publication

All authors of this paper have read and approved the final version submitted.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Consent to participate

Not applicable.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, W., Wang, X., Zhang, W. et al. Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition. Multimed Tools Appl 81, 9685–9711 (2022). https://doi.org/10.1007/s11042-022-11919-y

Download citation

Received: 02 June 2021
Revised: 31 August 2021
Accepted: 03 January 2022
Published: 12 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-11919-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition

Abstract

Access this article

Similar content being viewed by others

Efficient combination of classifiers for 3D action recognition

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Consent to participate

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition

Abstract

Access this article

Similar content being viewed by others

Efficient combination of classifiers for 3D action recognition

Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Data Availability

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent for Publication

Conflict of Interests

Additional information

Consent to participate

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation