Skip to main content

Advertisement

Log in

Efficient combination of classifiers for 3D action recognition

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The popular task of 3D human action recognition is almost exclusively solved by training deep-learning classifiers. To achieve high recognition accuracy, input 3D actions are often pre-processed by various normalization or augmentation techniques. However, it is not computationally feasible to train a classifier for each possible variant of training data to select the best-performing combination of pre-processing techniques for a given dataset. In this paper, we propose an evaluation procedure that determines the best combination in a very efficient way. In particular, we only train one independent classifier for each available pre-processing technique and estimate the accuracy of a specific combination by efficient fusion of the corresponding classification results based on a strict majority vote rule. In addition, for the best-ranked combination, we can retrospectively apply the normalized/augmented variants of input data to train only a single classifier. This enables to decide whether it is generally better to train a single model, or rather a set of independent classifiers whose results are fused within the classification phase. We evaluate the experiments on single-subject as well as person-interaction datasets of 3D skeleton sequences and all combinations of up to 16 normalization and augmentation techniques, some of which are proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Caetano, C., Sena de Souza, J., Santos, J., Schwartz, W.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 16th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), pp. 1–8. IEEE (2019)

  2. Dotti, D., Ghaleb, E., Asteriadis, S.: Temporal triplet mining for personality recognition. In: 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 171–178. IEEE Computer Society, Los Alamitos, CA, USA (2020). https://doi.org/10.1109/FG47880.2020.00024. https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00024

  3. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE Computer Society (2015)

  4. Huang, Z., Wan, C., Probst, T., Van Gool, L.: Deep learning on lie groups for skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1243–1252. IEEE (2017). https://doi.org/10.1109/CVPR.2017.137

  5. Huynh-The, T., Hua, C.H., Tu, N.A., Kim, D.S.: Learning geometric features with dual–stream cnn for 3d action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2353–2357. IEEE (2020)

  6. Ke, Q., Bennamoun, M., An, S., Sohel, F.A., Boussaïd, F.: A new representation of skeleton sequences for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3288–3297. IEEE (2017)

  7. Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)

    Article  MathSciNet  Google Scholar 

  8. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881

    Article  Google Scholar 

  9. Laraba, S., Brahimi, M., Tilmanne, J., Dutoit, T.: 3d skeleton-based action recognition by representing motion capture sequences as 2d-RGB images. Comput. Anim. Virt. Worlds 28(3–4), e1782 (2017)

    Article  Google Scholar 

  10. Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition. In: 32nd International Conference on Artificial Intelligence (AAAI), pp. 7444–7452. AAAI Press (2018)

  11. Li, J., Xie, X., Pan, Q., Cao, Y., Zhao, Z., Shi, G.: SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn. 1–38 (2020)

  12. Liu, J., Wang, G., Duan, L., Hu, P., Kot, A.C.: Skeleton based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)

    Article  MathSciNet  Google Scholar 

  13. Liu, K., Gao, L., Khan, N.M., Qi, L., Guan, L.: Graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. In: 21st International Symposium on Multimedia (ISM), pp. 25–31. IEEE (2019)

  14. Liu, R., Xu, C., Zhang, T., Zhao, W., Cui, Z., Yang, J.: Si-GCN: Structure-induced graph convolution network for skeleton-based action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8851767

  15. Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap database HDM05. Tech. Rep. CG-2007-2, Universität Bonn (2007)

  16. Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)

    Article  Google Scholar 

  17. Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. CoRR abs/1712.04621, 1–8 (2017). arXiv:1712.04621

  18. Poppe, R., Van Der Zee, S., Heylen, D.K.J., Taylor, P.J.: Amab: Automated measurement and analysis of body motion. Behav. Res. Methods (BRM) 46(3), 625–633 (2014)

    Google Scholar 

  19. Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method (2020)

  20. Sedmidubsky, J., Elias, P., Zezula, P.: Effective and efficient similarity searching in motion capture data. Multimed. Tools Appl. 77(10), 12073–12094 (2018). https://doi.org/10.1007/s11042-017-4859-7

    Article  Google Scholar 

  21. Sedmidubsky, J., Zezula, P.: Probabilistic classification of skeleton sequences. In: 29th International Conference on Database and Expert Systems Applications (DEXA), pp. 50–65. Springer International Publishing, Cham (2018)

  22. Sedmidubsky, J., Zezula, P.: Augmenting Spatio-Temporal human motion data for effective 3D action recognition. In: 21st IEEE International Symposium on Multimedia (ISM), pp. 204–207. IEEE Computer Society (2019). https://doi.org/10.1109/ISM46123.2019.00044

  23. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703. IEEE (2019)

  24. Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (BMVC), pp. 1–13. BMVA Press (2018)

  25. Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3633–3642. IEEE (2017)

  26. Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 119, 3–11 (2019). https://doi.org/10.1016/j.patrec.2018.02.010

    Article  Google Scholar 

  27. Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018). https://doi.org/10.1016/j.knosys.2018.05.029

    Article  Google Scholar 

  28. Wu, D., Chen, J., Sharma, N., Pan, S., Long, G., Blumenstein, M.: Adversarial action data augmentation for similar gesture action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)

  29. Wu, Y., Wei, L., Duan, Y.: Deep spatiotemporal LSTM network with temporal pattern feature for 3d human action recognition. Computat. Intell. 35(3), 535–554 (2019). https://doi.org/10.1111/coin.12207

    Article  MathSciNet  Google Scholar 

  30. Xie, C., Li, C., Zhang, B., Chen, C., Han, J., Liu, J.: Memory attention networks for skeleton-based action recognition. In: 27th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1639—1645. AAAI Press (2018)

  31. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)

  32. Yang, H., Gu, Y., Zhu, J., Hu, K., Zhang, X.: PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8, 10040–10047 (2020)

    Article  Google Scholar 

  33. Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–8. IEEE Computer Society (2012)

  34. Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer lstm networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148–157. IEEE (2017)

  35. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Li, C., Zhou, X., Yang, J.: Deep manifold-to-manifold transforming network for skeleton-based action recognition. IEEE Trans. Multimed. (2020). https://doi.org/10.1109/TMM.2020.2966878

    Article  Google Scholar 

  36. Zhu, G., Zhang, L., Li, H., Shen, P., Shah, S.A.A., Bennamoun, M.: Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recogn. Lett. (2020). https://doi.org/10.1016/j.patrec.2020.05.005

    Article  Google Scholar 

  37. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI Conference on Artificial Intelligence (AAAI), pp. 3697–3703. AAAI Press (2016)

  38. Zhuang, N., Ye, J., Hua, K.A.: Convolutional DLSTM for crowd scene understanding. In: IEEE International Symposium on Multimedia (ISM), pp. 61–68. IEEE (2017)

Download references

Acknowledgements

This research is supported by the Czech Science Foundation project No. GA19-02033S.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Sedmidubsky.

Additional information

Communicated by B. Þór Jónsson.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sedmidubsky, J., Zezula, P. Efficient combination of classifiers for 3D action recognition. Multimedia Systems 27, 941–952 (2021). https://doi.org/10.1007/s00530-021-00767-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00767-9