Efficient combination of classifiers for 3D action recognition

Sedmidubsky, Jan; Zezula, Pavel

doi:10.1007/s00530-021-00767-9

Efficient combination of classifiers for 3D action recognition

Regular Paper
Published: 14 March 2021

Volume 27, pages 941–952, (2021)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

343 Accesses
3 Citations
Explore all metrics

Abstract

The popular task of 3D human action recognition is almost exclusively solved by training deep-learning classifiers. To achieve high recognition accuracy, input 3D actions are often pre-processed by various normalization or augmentation techniques. However, it is not computationally feasible to train a classifier for each possible variant of training data to select the best-performing combination of pre-processing techniques for a given dataset. In this paper, we propose an evaluation procedure that determines the best combination in a very efficient way. In particular, we only train one independent classifier for each available pre-processing technique and estimate the accuracy of a specific combination by efficient fusion of the corresponding classification results based on a strict majority vote rule. In addition, for the best-ranked combination, we can retrospectively apply the normalized/augmented variants of input data to train only a single classifier. This enables to decide whether it is generally better to train a single model, or rather a set of independent classifiers whose results are fused within the classification phase. We evaluate the experiments on single-subject as well as person-interaction datasets of 3D skeleton sequences and all combinations of up to 16 normalization and augmentation techniques, some of which are proposed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning correlations for human action recognition in videos

Article 10 February 2017

Meta-process-driven 3D skeleton feature learning for enhanced human action recognition

Article 22 April 2025

An efficient end-to-end deep learning architecture for activity classification

Article 22 August 2018

References

Caetano, C., Sena de Souza, J., Santos, J., Schwartz, W.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 16th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Dotti, D., Ghaleb, E., Asteriadis, S.: Temporal triplet mining for personality recognition. In: 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 171–178. IEEE Computer Society, Los Alamitos, CA, USA (2020). https://doi.org/10.1109/FG47880.2020.00024. https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00024
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE Computer Society (2015)
Huang, Z., Wan, C., Probst, T., Van Gool, L.: Deep learning on lie groups for skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1243–1252. IEEE (2017). https://doi.org/10.1109/CVPR.2017.137
Huynh-The, T., Hua, C.H., Tu, N.A., Kim, D.S.: Learning geometric features with dual–stream cnn for 3d action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2353–2357. IEEE (2020)
Ke, Q., Bennamoun, M., An, S., Sohel, F.A., Boussaïd, F.: A new representation of skeleton sequences for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3288–3297. IEEE (2017)
Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)
Article MathSciNet Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881
Article Google Scholar
Laraba, S., Brahimi, M., Tilmanne, J., Dutoit, T.: 3d skeleton-based action recognition by representing motion capture sequences as 2d-RGB images. Comput. Anim. Virt. Worlds 28(3–4), e1782 (2017)
Article Google Scholar
Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition. In: 32nd International Conference on Artificial Intelligence (AAAI), pp. 7444–7452. AAAI Press (2018)
Li, J., Xie, X., Pan, Q., Cao, Y., Zhao, Z., Shi, G.: SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn. 1–38 (2020)
Liu, J., Wang, G., Duan, L., Hu, P., Kot, A.C.: Skeleton based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Liu, K., Gao, L., Khan, N.M., Qi, L., Guan, L.: Graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. In: 21st International Symposium on Multimedia (ISM), pp. 25–31. IEEE (2019)
Liu, R., Xu, C., Zhang, T., Zhao, W., Cui, Z., Yang, J.: Si-GCN: Structure-induced graph convolution network for skeleton-based action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8851767
Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap database HDM05. Tech. Rep. CG-2007-2, Universität Bonn (2007)
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. CoRR abs/1712.04621, 1–8 (2017). arXiv:1712.04621
Poppe, R., Van Der Zee, S., Heylen, D.K.J., Taylor, P.J.: Amab: Automated measurement and analysis of body motion. Behav. Res. Methods (BRM) 46(3), 625–633 (2014)
Google Scholar
Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method (2020)
Sedmidubsky, J., Elias, P., Zezula, P.: Effective and efficient similarity searching in motion capture data. Multimed. Tools Appl. 77(10), 12073–12094 (2018). https://doi.org/10.1007/s11042-017-4859-7
Article Google Scholar
Sedmidubsky, J., Zezula, P.: Probabilistic classification of skeleton sequences. In: 29th International Conference on Database and Expert Systems Applications (DEXA), pp. 50–65. Springer International Publishing, Cham (2018)
Sedmidubsky, J., Zezula, P.: Augmenting Spatio-Temporal human motion data for effective 3D action recognition. In: 21st IEEE International Symposium on Multimedia (ISM), pp. 204–207. IEEE Computer Society (2019). https://doi.org/10.1109/ISM46123.2019.00044
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703. IEEE (2019)
Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (BMVC), pp. 1–13. BMVA Press (2018)
Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3633–3642. IEEE (2017)
Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 119, 3–11 (2019). https://doi.org/10.1016/j.patrec.2018.02.010
Article Google Scholar
Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018). https://doi.org/10.1016/j.knosys.2018.05.029
Article Google Scholar
Wu, D., Chen, J., Sharma, N., Pan, S., Long, G., Blumenstein, M.: Adversarial action data augmentation for similar gesture action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Wu, Y., Wei, L., Duan, Y.: Deep spatiotemporal LSTM network with temporal pattern feature for 3d human action recognition. Computat. Intell. 35(3), 535–554 (2019). https://doi.org/10.1111/coin.12207
Article MathSciNet Google Scholar
Xie, C., Li, C., Zhang, B., Chen, C., Han, J., Liu, J.: Memory attention networks for skeleton-based action recognition. In: 27th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1639—1645. AAAI Press (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)
Yang, H., Gu, Y., Zhu, J., Hu, K., Zhang, X.: PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8, 10040–10047 (2020)
Article Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–8. IEEE Computer Society (2012)
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer lstm networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148–157. IEEE (2017)
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Li, C., Zhou, X., Yang, J.: Deep manifold-to-manifold transforming network for skeleton-based action recognition. IEEE Trans. Multimed. (2020). https://doi.org/10.1109/TMM.2020.2966878
Article Google Scholar
Zhu, G., Zhang, L., Li, H., Shen, P., Shah, S.A.A., Bennamoun, M.: Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recogn. Lett. (2020). https://doi.org/10.1016/j.patrec.2020.05.005
Article Google Scholar
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI Conference on Artificial Intelligence (AAAI), pp. 3697–3703. AAAI Press (2016)
Zhuang, N., Ye, J., Hua, K.A.: Convolutional DLSTM for crowd scene understanding. In: IEEE International Symposium on Multimedia (ISM), pp. 61–68. IEEE (2017)

Download references

Acknowledgements

This research is supported by the Czech Science Foundation project No. GA19-02033S.

Author information

Authors and Affiliations

Masaryk University, Brno, Czech Republic
Jan Sedmidubsky & Pavel Zezula

Authors

Jan Sedmidubsky
View author publications
Search author on:PubMed Google Scholar
Pavel Zezula
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jan Sedmidubsky.

Additional information

Communicated by B. Þór Jónsson.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sedmidubsky, J., Zezula, P. Efficient combination of classifiers for 3D action recognition. Multimedia Systems 27, 941–952 (2021). https://doi.org/10.1007/s00530-021-00767-9

Download citation

Received: 26 June 2020
Accepted: 18 February 2021
Published: 14 March 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00530-021-00767-9

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient combination of classifiers for 3D action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning correlations for human action recognition in videos

Meta-process-driven 3D skeleton feature learning for enhanced human action recognition

An efficient end-to-end deep learning architecture for activity classification

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now