Abstract
The popular task of 3D human action recognition is almost exclusively solved by training deep-learning classifiers. To achieve high recognition accuracy, input 3D actions are often pre-processed by various normalization or augmentation techniques. However, it is not computationally feasible to train a classifier for each possible variant of training data to select the best-performing combination of pre-processing techniques for a given dataset. In this paper, we propose an evaluation procedure that determines the best combination in a very efficient way. In particular, we only train one independent classifier for each available pre-processing technique and estimate the accuracy of a specific combination by efficient fusion of the corresponding classification results based on a strict majority vote rule. In addition, for the best-ranked combination, we can retrospectively apply the normalized/augmented variants of input data to train only a single classifier. This enables to decide whether it is generally better to train a single model, or rather a set of independent classifiers whose results are fused within the classification phase. We evaluate the experiments on single-subject as well as person-interaction datasets of 3D skeleton sequences and all combinations of up to 16 normalization and augmentation techniques, some of which are proposed in this paper.







Similar content being viewed by others
References
Caetano, C., Sena de Souza, J., Santos, J., Schwartz, W.: Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 16th IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Dotti, D., Ghaleb, E., Asteriadis, S.: Temporal triplet mining for personality recognition. In: 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 171–178. IEEE Computer Society, Los Alamitos, CA, USA (2020). https://doi.org/10.1109/FG47880.2020.00024. https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00024
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1110–1118. IEEE Computer Society (2015)
Huang, Z., Wan, C., Probst, T., Van Gool, L.: Deep learning on lie groups for skeleton-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1243–1252. IEEE (2017). https://doi.org/10.1109/CVPR.2017.137
Huynh-The, T., Hua, C.H., Tu, N.A., Kim, D.S.: Learning geometric features with dual–stream cnn for 3d action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2353–2357. IEEE (2020)
Ke, Q., Bennamoun, M., An, S., Sohel, F.A., Boussaïd, F.: A new representation of skeleton sequences for 3d action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3288–3297. IEEE (2017)
Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998). https://doi.org/10.1109/34.667881
Laraba, S., Brahimi, M., Tilmanne, J., Dutoit, T.: 3d skeleton-based action recognition by representing motion capture sequences as 2d-RGB images. Comput. Anim. Virt. Worlds 28(3–4), e1782 (2017)
Li, C., Cui, Z., Zheng, W., Xu, C., Yang, J.: Spatio-temporal graph convolution for skeleton based action recognition. In: 32nd International Conference on Artificial Intelligence (AAAI), pp. 7444–7452. AAAI Press (2018)
Li, J., Xie, X., Pan, Q., Cao, Y., Zhao, Z., Shi, G.: SGM-net: skeleton-guided multimodal network for action recognition. Pattern Recogn. 1–38 (2020)
Liu, J., Wang, G., Duan, L., Hu, P., Kot, A.C.: Skeleton based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Liu, K., Gao, L., Khan, N.M., Qi, L., Guan, L.: Graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. In: 21st International Symposium on Multimedia (ISM), pp. 25–31. IEEE (2019)
Liu, R., Xu, C., Zhang, T., Zhao, W., Cui, Z., Yang, J.: Si-GCN: Structure-induced graph convolution network for skeleton-based action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019). https://doi.org/10.1109/IJCNN.2019.8851767
Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap database HDM05. Tech. Rep. CG-2007-2, Universität Bonn (2007)
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. CoRR abs/1712.04621, 1–8 (2017). arXiv:1712.04621
Poppe, R., Van Der Zee, S., Heylen, D.K.J., Taylor, P.J.: Amab: Automated measurement and analysis of body motion. Behav. Res. Methods (BRM) 46(3), 625–633 (2014)
Ren, B., Liu, M., Ding, R., Liu, H.: A survey on 3d skeleton-based action recognition using learning method (2020)
Sedmidubsky, J., Elias, P., Zezula, P.: Effective and efficient similarity searching in motion capture data. Multimed. Tools Appl. 77(10), 12073–12094 (2018). https://doi.org/10.1007/s11042-017-4859-7
Sedmidubsky, J., Zezula, P.: Probabilistic classification of skeleton sequences. In: 29th International Conference on Database and Expert Systems Applications (DEXA), pp. 50–65. Springer International Publishing, Cham (2018)
Sedmidubsky, J., Zezula, P.: Augmenting Spatio-Temporal human motion data for effective 3D action recognition. In: 21st IEEE International Symposium on Multimedia (ISM), pp. 204–207. IEEE Computer Society (2019). https://doi.org/10.1109/ISM46123.2019.00044
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5693–5703. IEEE (2019)
Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (BMVC), pp. 1–13. BMVA Press (2018)
Wang, H., Wang, L.: Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3633–3642. IEEE (2017)
Wang, J., Chen, Y., Hao, S., Peng, X., Hu, L.: Deep learning for sensor-based activity recognition: a survey. Pattern Recogn. Lett. 119, 3–11 (2019). https://doi.org/10.1016/j.patrec.2018.02.010
Wang, P., Li, W., Li, C., Hou, Y.: Action recognition based on joint trajectory maps with convolutional neural networks. Knowl.-Based Syst. 158, 43–53 (2018). https://doi.org/10.1016/j.knosys.2018.05.029
Wu, D., Chen, J., Sharma, N., Pan, S., Long, G., Blumenstein, M.: Adversarial action data augmentation for similar gesture action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Wu, Y., Wei, L., Duan, Y.: Deep spatiotemporal LSTM network with temporal pattern feature for 3d human action recognition. Computat. Intell. 35(3), 535–554 (2019). https://doi.org/10.1111/coin.12207
Xie, C., Li, C., Zhang, B., Chen, C., Han, J., Liu, J.: Memory attention networks for skeleton-based action recognition. In: 27th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1639—1645. AAAI Press (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)
Yang, H., Gu, Y., Zhu, J., Hu, K., Zhang, X.: PGCN-TCA: pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8, 10040–10047 (2020)
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–8. IEEE Computer Society (2012)
Zhang, S., Liu, X., Xiao, J.: On geometric features for skeleton-based action recognition using multilayer lstm networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 148–157. IEEE (2017)
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Li, C., Zhou, X., Yang, J.: Deep manifold-to-manifold transforming network for skeleton-based action recognition. IEEE Trans. Multimed. (2020). https://doi.org/10.1109/TMM.2020.2966878
Zhu, G., Zhang, L., Li, H., Shen, P., Shah, S.A.A., Bennamoun, M.: Topology-learnable graph convolution for skeleton-based action recognition. Pattern Recogn. Lett. (2020). https://doi.org/10.1016/j.patrec.2020.05.005
Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI Conference on Artificial Intelligence (AAAI), pp. 3697–3703. AAAI Press (2016)
Zhuang, N., Ye, J., Hua, K.A.: Convolutional DLSTM for crowd scene understanding. In: IEEE International Symposium on Multimedia (ISM), pp. 61–68. IEEE (2017)
Acknowledgements
This research is supported by the Czech Science Foundation project No. GA19-02033S.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by B. Þór Jónsson.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sedmidubsky, J., Zezula, P. Efficient combination of classifiers for 3D action recognition. Multimedia Systems 27, 941–952 (2021). https://doi.org/10.1007/s00530-021-00767-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00767-9