Abstract
Hand gesture and action recognition have been extensively researched in the past two decades due to the emerging advanced acquisition and interaction technologies, which open the floodgates for a vast range of potential applications. Particularly, many spatial–temporal feature extractors have been proposed, such as RNNs-based models, temporal convolutional network (TCN), and 3D convolutional neural networks (3DCNN) for modeling long-term dependencies in sequential data. However, it remains challenging to obtain a high recognition rate because of the difficulty of effectively extracting spatial–temporal features and efficiently classifying them with noisy and complex skeleton sequences. Therefore, this paper proposes a deep ensemble framework called multi-model ensemble gesture recognition network (MMEGRN) for skeleton-based hand gesture recognition. Specifically, to establish effective feature extraction and accurate gesture recognition, we propose an architecture consisting of four sub-networks, three spatio-temporal features classifiers to leverage their various capabilities of extracting and classifying skeleton sequences. Through late feature fusion, the features resulted from the feature extractors of each sub-network are fused into a new fusion classifier. Each subnetwork is trained independently to perform the task of gesture recognition using only skeleton joints. The training is performed using the cyclic annealing learning rate to generate a series of models that are combined in an ensemble using the optimized weighted ensemble (OWE) method. The proposed framework combines deep learning and ensemble strengths to establish a new deep-learning network architecture for more accurate and efficient hand gesture recognition. Extensive experiments on three skeleton-based hand gesture recognition datasets have shown the effectiveness of the proposed framework and the superiority over other models in terms of recognition accuracy.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ashukha A, Lyzhov A, Molchanov D, Vetrov D (2020) Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. https://arxiv.org/abs/2002.06470
Avola D, Bernardi M, Cinque L et al (2019) Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2856094
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. https://arxiv.org/abs/1803.01271
Boulahia SY, Anquetil E, Multon F, Kulpa R (2018) Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: Proceedings of the 7th international conference on image processing theory, tools and applications, IPTA 2017, pp 1–6
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion. https://doi.org/10.1016/j.inffus.2004.04.004
Cao Z, Hidalgo Martinez G, Simon T et al (2019) OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2019.2929257
Chen X, Wang G, Guo H et al (2019a) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors 19:239. https://doi.org/10.3390/s19020239
Chen X, Wang G, Guo H et al (2019b) MFA-Net: motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors. https://doi.org/10.3390/s19020239
Chen Y, Zhao L, Peng X et al (2020) Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: 30th British machine vision conference 2019, BMVC 2019, pp 48.1–48.13
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
De Smedt Q (2017) Dynamic hand gesture recognition-from traditional handcrafted to recent deep learning approaches. Université de Lille 1
De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 1–9
De Smedt Q, Wannous H, Vandeborre JP et al (2017) SHREC’17 track: 3D hand gesture recognition using a depth and skeletal dataset. In: Eurographics workshop on 3D object retrieval, EG 3DOR, pp 1–6
Devanne M, Wannous H, Berretti S et al (2015) 3-D human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2014.2350774
Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on skeletal data. In: Proceedings of 13th IEEE international conference on automatic face and gesture recognition, FG 2018, pp 106–113
Dietterich TG (2000) Ensemble methods in machine learning: multiple classifier systems. Springer, Berlin
Doosti B (2019) Hand pose estimation: a survey. https://arxiv.org/abs/1903.01013
El-Baz AH, Tolba AS (2013) An efficient algorithm for 3D hand gesture recognition using combined neural classifiers. Neural Comput Appl. https://doi.org/10.1007/s00521-012-0844-2
Hashem S (1997) Optimal linear combinations of neural networks. Neural Netw. https://doi.org/10.1016/S0893-6080(96)00098-6
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, IEEE Computer Society, pp 770–778
Hou J, Wang G, Chen X et al (2019) Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 273–286
Huang G, Li Y, Pleiss G et al (2017) Snapshot ensembles: train 1, get M for free. In: 5th International conference on learning representations, ICLR 2017
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2012.59
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd International conference on learning representations, ICLR 2015
Kobylarz J, Bird JJ, Faria DR et al (2020) Thumbs up, thumbs down: non-verbal human-robot interaction through real-time EMG classification via inductive and supervised transductive transfer learning. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01852-z
Kong Y, Li L, Zhang K et al (2019) Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. J Electron Imaging. https://doi.org/10.1117/1.jei.28.4.043032
Kraft D (1988) A software package for sequential quadratic programming. Dfvlr-Fb. http://degenerateconic.com/wp-content/uploads/2018/03/DFVLR_FB_88_28.pdf
Lai K, Yanushkevich S (2020) An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: Proceedings of the international joint conference on neural networks, pp 1–7
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit 68:346–362. https://doi.org/10.1016/j.patcog.2017.02.030
Liu H, Tu J, Liu M, Ding R (2018) Learning explicit shape and motion evolution maps for skeleton-based human action recognition. In: ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings, Institute of Electrical and Electronics Engineers Inc., pp 1333–1337
Liu J, Liu Y, Wang Y (2020) Decoupled representation learning for skeleton-based gesture recognition. In: IEEE/CVF conference on computer vision and pattern recognition, pp 5751–5760
Lupinetti K, Ranieri A, Giannini F, Monti M (2020) 3D dynamic hand gestures recognition using the Leap Motion sensor and convolutional neural networks. https://arxiv.org/abs/2003.01450
Ma C, Wang A, Chen G, Xu C (2018) Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis Comput. https://doi.org/10.1007/s00371-018-1556-0
Maghoumi M, LaViola JJ (2019) DeepGRU: deep gesture recognition utility. In: International symposium on visual computing, pp 16–31
Mohammed AAQ, Lv J, Islam MDS (2019a) A deep learning-based end-to-end composite system for hand detection and gesture recognition. Sensors. https://doi.org/10.3390/s19235282
Mohammed AAQ, Lv J, Islam MS (2019b) Small deep learning models for hand gesture recognition. In: Proceedings of 2019 IEEE international conference on parallel and distributed processing with applications, big data and cloud computing, sustainable computing and communications, social computing and networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp 1429–1435
Nguyen XS, Brun L, Lezoray O, Bougleux S (2019) A neural network based on spd manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 12036–12045
Núñez JC, Cabido R, Pantrigo JJ et al (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit. https://doi.org/10.1016/j.patcog.2017.10.033
Ohn-Bar E, Trivedi MM (2013) Joint angles similarities and HOG2 for action recognition. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 465–470
Oord A van den, Dieleman S, Zen H et al (2016) WaveNet: a generative model for raw audio based on PixelCNN architecture. https://arxiv.org/abs/1609.03499
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 716–723
Ponti MP (2011) Combining classifiers: from the creation of ensembles to the decision fusion. In: Proceedings of 24th SIBGRAPI conference on graphics, patterns, and images tutorials, SIBGRAPI-T 2011, pp 1–10
Shahhosseini M, Hu G, Pham H (2019) Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. https://arxiv.org/abs/1908.05287
Shi X, Chen Z, Wang H et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Shin S, Kim WY (2020) Skeleton-based dynamic hand gesture recognition using a part-based GRU-RNN for gesture-based interface. IEEE Access 8:50236–50243. https://doi.org/10.1109/ACCESS.2020.2980128
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–9
Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 499–508
Wang GW, Zhang C, Zhuang J (2012) An application of classifier combination methods in hand gesture recognition. Math Probl Eng. https://doi.org/10.1155/2012/346951
Wu J, Ishwar P, Konrad J (2016) Two-stream CNNs for gesture-based verification and identification: learning user style. In: IEEE computer society conference on computer vision and pattern recognition workshops, IEEE Computer Society, pp 110–118
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 20–27
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, pp 7444–7452
Yang F, Wu Y, Sakti S, Nakamura S (2019) Make skeleton-based action recognition model smaller, faster and better. In: 1st ACM international conference on multimedia in Asia, MMAsia 2019, pp 1–6
Zhang S, Yang Y, Xiao J et al (2018) Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2018.2802648
Zhu W, Lan C, Xing J et al (2016) Co-Occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: 30th AAAI conference on artificial intelligence, AAAI 2016, pp 3697–3704
Funding
This work is supported by the National Natural Science Foundation of China (Grant No. U1831121), and the Science and Technology Major Project of Sichuan province (Grant Nos. 2019ZDZX0006 and 2020YFG0478).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mohammed, A.A.Q., Lv, J., Islam, M.S. et al. Multi-model ensemble gesture recognition network for high-accuracy dynamic hand gesture recognition. J Ambient Intell Human Comput 14, 6829–6842 (2023). https://doi.org/10.1007/s12652-021-03546-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03546-6