Skip to main content
Log in

Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The 3D human body skeleton conveys rich information of human action and is regarded as an important data modality for action recognition. Due to the diversity of human action and the noise in skeleton data, skeleton-based action recognition methods face the challenges of overcoming the interference of irrelevant data and learning enough valid information of human action. Previous research has led us to a variety of effective skeleton features and many deep network models with strong learning abilities. However, a single model using a single feature cannot make full use of the valid information in the skeleton. To address this problem, this paper proposes the multi-level kinematic constraints to construct multiple skeleton features. By using different levels of constraints, a set of features containing information from local to global are extracted. The variability among these features leads to significant variability in classifiers trained on them, thus enhancing the ensemble performance of these classifiers. Extensive experiments on three representative datasets and four kinds of classification models demonstrate the generality of the proposed method. A substantial improvement can be achieved on multiple kinds of existing well-performing models and our method surpasses most state-of-the-art skeleton-based action recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

All data generated or analysed during this study are included in this published article.

Code Availability

Not applicable.

Notes

  1. FLoating-number OPerations

References

  1. Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recognit Lett 48:70–80. https://doi.org/10.1016/JPATREC201404011

    Article  Google Scholar 

  2. Avola D, Cascio M, Cinque L, Foresti GL, Massaroni C, Rodola E (2020) 2-D skeleton-based action recognition via two-branch stacked LSTM-RNNs. IEEE Trans Multimed 22(10):2481–2496. https://doi.org/10.1109/TMM20192960588

    Article  Google Scholar 

  3. Bian C, Feng W, Wan L, Wang S (2021) Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans Image Process 30:2963–2976. https://doi.org/10.1109/TIP.2021.3056895

    Article  Google Scholar 

  4. Chen Y, Wang L, Li C, Hou Y, Li W (2020) ConvNets-based action recognition from skeleton motion maps. Multimed Tools Appl 79 (3–4):1707–1725. https://doi.org/10.1007/s11042-019-08261-1

    Article  Google Scholar 

  5. Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proc conf comput vision pattern recognit, pp 183–192. https://doi.org/10.1109/CVPR42600202000026

  6. Ding W, Liu K, Belyaev E, Cheng F (2018) Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit 77:75–86. https://doi.org/10.1016/JPATCOG201712004

    Article  Google Scholar 

  7. Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 3rd IAPR Asian conference on pattern recognition, pp 579–583

  8. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proc conf comput vision pattern recognit, pp 1110–1118. https://doi.org/10.1109/CVPR20157298714

  9. Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: 22nd International conference on pattern recognition, pp 4513–4518. https://doi.org/10.1109/ICPR2014772

  10. Fan Z, Zhao X, Lin T, Su H (2019) Attention-based multiview re-observation fusion network for skeletal action recognition. IEEE Trans Multimed 21 (2):363–374. https://doi.org/10.1109/TMM20182859620

    Article  Google Scholar 

  11. Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Trans Pattern Anal Mach Intell 39(4):773–787. https://doi.org/10.1109/TPAMI20162558148

    Article  Google Scholar 

  12. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5, SI):1318–1334. https://doi.org/10.1109/TCYB20132265378

    Article  Google Scholar 

  13. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proc Int Conf Comput Vision, pp. 1026–1034. https://doi.org/10.1109/ICCV2015123

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc Conf Comput Vision Pattern Recognit, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

  15. Hu G, Cui B, Yu S (2020) Joint learning in the spatio-temporal and frequency domains for skeleton-based action recognition. IEEE Trans Multimed 22 (9):2207–2220. https://doi.org/10.1109/TMM20192953325

    Article  Google Scholar 

  16. Hu J, Zheng W, Lai J, Zhang J (2017) Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2186–2200. https://doi.org/10.1109/TPAMI20162640292

    Article  Google Scholar 

  17. Ji X, Cheng J, Feng W, Tao D (2018) Skeleton embedded motion body partition for human action recognition using depth sequences. Singal Processing 143:56–68. https://doi.org/10.1016/JSIGPRO201708016

    Article  Google Scholar 

  18. Johansson G (1973) Visual-perception of biological motion and a model for its analysis. Percep Psychophys 14(2):201–211. https://doi.org/10.3758/BF03212378

    Article  Google Scholar 

  19. Ke Q, An S, Bennamoun M, Sohel F, Boussaid F (2017) SkeletonNet: mining deep part features for 3-D action recognition. IEEE Signal Process Lett 24(6):731–735. https://doi.org/10.1109/LSP20172690339

    Article  Google Scholar 

  20. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: Proc conf comput vision pattern recognit, pp 4570–4579. https://doi.org/10.1109/CVPR2017486

  21. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3D action recognition. IEEE Trans Image Process 27(6):2842–2855. https://doi.org/10.1109/TIP20182812099

    Article  MathSciNet  Google Scholar 

  22. Keselman L, Woodfill JI, Grunnet-Jepsen A, Bhowmik A (2017) Intel (R) realsense (TM) stereoscopic depth cameras. In: Proc conf comput vision pattern recognit workshops, pp 1267–1276. https://doi.org/10.1109/CVPRW2017167

  23. Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. In: Proc conf comput vision pattern recognit workshops, pp 1623–1631. https://doi.org/10.1109/CVPRW2017207

  24. Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proc conf comput vision pattern recognit, pp 1003–1012. https://doi.org/10.1109/CVPR2017113

  25. Lee I, Kim D, Kang S, Lee S (2017) Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks. In: Proc Int Conf Comput Vision, pp 1012–1020. https://doi.org/10.1109/ICCV2017115

  26. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proc conf comput vision pattern recognit, pp 3590–3598. https://doi.org/10.1109/CVPR201900371

  27. Li Y, Xia R, Liu X (2020) Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recognit, 103. https://doi.org/10.1016/j.patcog.2020.107293

  28. Liu J, Shahroudy A, Perez M, Wang G, Duan L, Kot AC (2020) NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2684–2701. https://doi.org/10.1109/TPAMI20192916873

    Article  Google Scholar 

  29. Liu J, Shahroudy A, Wang G, Duan L, Kot AC (2020) Skeleton-based online action prediction using scale selection network. IEEE Trans Pattern Anal Mach Intell 42(6):1453–1467. https://doi.org/10.1109/TPAMI20192898954

    Article  Google Scholar 

  30. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Proc Euro conf comput vision, pp 816–833. https://doi.org/10.1007/978-3-319-46487-9_50

  31. Liu J, Wang G, Duan L, Abdiyeva K, Kot AC (2018) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599. https://doi.org/10.1109/TIP20172785279

    Article  MathSciNet  Google Scholar 

  32. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362. https://doi.org/10.1016/JPATCOG201702030

    Article  Google Scholar 

  33. Liu M, Yuan J (2018) Recognizing human actions as the evolution of pose estimation maps. In: Proc conf comput vision pattern recognit, pp 1159–1168. https://doi.org/10.1109/CVPR201800127

  34. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 143–152. https://doi.org/10.1109/CVPR42600202000022

  35. Nie Q, Wang J, Wang X, Liu Y (2019) View-invariant human action recognition based on a 3D bio-constrained skeleton model. IEEE Trans Image Process 28(8):3959–3972. https://doi.org/10.1109/TIP20192907048

    Article  MathSciNet  Google Scholar 

  36. Pakrashi A, Mac Namee B (2019) Kalman filter-based heuristic ensemble (kfhe): a new perspective on multi-class ensemble classification using kalman filters. Inform Sci 485:456–485. https://doi.org/10.1016/j.ins.2019.02.017

    Article  Google Scholar 

  37. Peddinti V, Wang Y, Povey D, Khudanpur S (2018) Low latency acoustic modeling using temporal convolution and LSTMs. IEEE Signal Process Lett 25(3):373–377. https://doi.org/10.1109/LSP20172723507

    Article  Google Scholar 

  38. Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proc Conf comput vision pattern recognit, pp 1010–1019. https://doi.org/10.1109/CVPR2016115

  39. Shahroudy A, Ng TT, Yang Q, Wang G (2016) Multimodal multipart learning for action recognition in depth videos. IEEE Trans Pattern Anal Mach Intell 38(10):2123–2129. https://doi.org/10.1109/TPAMI20152505295

    Article  Google Scholar 

  40. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 12018–12027. https://doi.org/10.1109/CVPR201901230

  41. Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545. https://doi.org/10.1109/TIP.2020.3028207

    Article  Google Scholar 

  42. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 1227–1236. https://doi.org/10.1109/CVPR201900132

  43. Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proc Conf AAAI artif intell. https://aaai.org/ocs/indexphp/AAAI/AAAI17/paper/view/14437, pp 4263–4270

  44. Sun B, Kong D, Wang S, Wang L, Wang Y, Yin B (2019) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl 78(5):6329–6353. https://doi.org/10.1007/s11042-018-6370-1

    Article  Google Scholar 

  45. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: Proc Conf comput vision pattern recognit, pp 588–595. https://doi.org/10.1109/CVPR201482

  46. Wang H, Wang L (2018) Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recognit 81:23–35. https://doi.org/10.1016/JPATCOG201803030

    Article  Google Scholar 

  47. Wang J, Liu Z, Wu Y, Yuan J (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36 (5):914–927. https://doi.org/10.1109/TPAMI2013198

    Article  Google Scholar 

  48. Wang J, Nie X, Xia Y, Wu Y, Zhu S (2014) Cross-view action modeling, learning and recognition. In: Proc Conf comput vision pattern recognit, pp 2649–2656. https://doi.org/10.1109/CVPR2014339

  49. Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM Int conf multimedia, pp 97–106. https://doi.org/10.1145/29642842967191

  50. Wei P, Sun H, Zheng N (2019) Learning composite latent structures for 3D human action representation and recognition. IEEE Trans Multimed 21 (9):2195–2208. https://doi.org/10.1109/TMM20192897902

    Article  Google Scholar 

  51. Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Proc Conf comput vision pattern recognit workshops, pp 20–27. https://doi.org/10.1109/CVPRW20126239233

  52. Xu Y, Cheng J, Wang L, Xia H, Liu F, Tao D (2018) Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Process Lett 25(7):1044–1048. https://doi.org/10.1109/LSP20182841649

    Article  Google Scholar 

  53. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proc Conf AAAI artif intell. https://aaai.org/ocs/indexphp/AAAI/AAAI18/paper/view/17135, pp 7444–7452

  54. Yang J, Liu W, Yuan J, Mei T (2021) Hierarchical soft quantization for skeleton-based human action recognition. IEEE Trans Multimed 23:883–898. https://doi.org/10.1109/TMM.2020.2990082

    Article  Google Scholar 

  55. Yang X, Tian Y (2014) Effective 3D action recognition using EigenJoints. J Vis Commun Image Represent 25(1, SI):2–11. https://doi.org/10.1016/JJVCIR201303001

    Article  Google Scholar 

  56. Yang Y, Deng C, Gao S, Liu W, Tao D, Gao X (2017) Discriminative multi-instance multitask learning for 3D action recognition. IEEE Trans Multimed 19(3):519–529. https://doi.org/10.1109/TMM20162626959

    Article  Google Scholar 

  57. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2019) View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans Pattern Anal Mach Intell 41(8):1963–1978. https://doi.org/10.1109/TPAMI20192896631

    Article  Google Scholar 

  58. Zhang P, Lan C, Zeng W, Xing J, Xue J, Zheng N (2020) Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proc Conf comput vision pattern recognit, pp 1112–1121. https://doi.org/10.1109/CVPR42600.2020.00119

  59. Zhang S, Yang Y, Xiao J, Liu X, Yang Y, Xie D, Zhuang Y (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimed 20(9):2330–2343. https://doi.org/10.1109/TMM20182802648

    Article  Google Scholar 

  60. Zhang X, Xu C, Tao D (2020) Context aware graph convolution for skeleton-based action recognition. In: Proc Conf comput vision pattern recognit, pp 14333–14342. https://doi.org/10.1109/CVPR42600202001434

Download references

Acknowledgements

This work was supported by the Nation Key Research and Development Program of China under Grant 2018YFB2003500.

Funding

This work was supported by the Nation Key Research and Development Program of China under Grant 2018YFB2003500.

Author information

Authors and Affiliations

Authors

Contributions

All authors of this research paper have directly participated in the planning, execution, or analysis of the study.

Corresponding author

Correspondence to Xue Wang.

Ethics declarations

Ethics approval

Not applicable.

Consent for Publication

All authors of this paper have read and approved the final version submitted.

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Consent to participate

Not applicable.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

You, W., Wang, X., Zhang, W. et al. Generic enhanced ensemble learning with multi-level kinematic constraints for 3D action recognition. Multimed Tools Appl 81, 9685–9711 (2022). https://doi.org/10.1007/s11042-022-11919-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-11919-y

Keywords

Navigation