Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos

Mahmoud, Rihem; Belgacem, Selma; Omri, Mohamed Nazih

doi:10.1007/s13042-020-01227-y

Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos

Original Article
Published: 02 January 2021

Volume 12, pages 1173–1189, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

268 Accesses
6 Citations
Explore all metrics

Abstract

In recent years, gesture recognition in video sequences has aroused growing interest in the fields of computer vision and behavioral understanding, for example in the control of robots and video games, in the field of video surveillance, automatic video indexing or content-based video retrieval. Processing large-scale continuous gesture data with in-depth, grayscale input videos remains a primary challenge for academic researchers. A wide range of recognition models have been proposed to solve this problem but have not proven their great performance. The main contribution of this article to address this problem is to segment the sequences of continuous gestures into isolated gestures, using the average of the velocity information calculated on the basis of the estimate of the deep optical flow, and to extract a set of relevant descriptors, called characteristics. signature, in order to characterize different intensities and spatial information describing the location, speed and orientation of movement. Finally, to transmit to a linear SVM the characteristics built for the depth and gray scale sequences, for each isolated segment for its classification. The experimental study carried out on the various standard data collections namely KTH, Chalearn and Weizmann, on our model and on the main models that we have studied in the literature, as well as the analysis of the results, which we obtained, clearly show the limits of these studied models and confirms the performance of our model as well as efficiency in terms of precision, recall and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards an end-to-end isolated and continuous deep gesture recognition process

Article 06 April 2022

Fast and Accurate Gesture Recognition Based on Motion Shapes

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Notes

References

BJ, Patil C (2018) Video based human activity detection, recognition and classification of actions using svm. Trans Mach Learn Artif Intell 6
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Understand 110(3):346–359
Article Google Scholar
Belgacem S, Chatelain C, Paquet T (2017) Gesture sequence recognition with one shot learned CRF/HMM hybrid model. J Image Vis Comput 61:12–21
Article Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Association for Computing Machinery
Boukhari K, Omri MN (2020) Approximate matching-based unsupervised document indexing approach : application to biomedical domain. Scientometrics 123:1–22. https://doi.org/10.1007/s11192-020-03474-w
Article Google Scholar
Bregonzio M, Xiang T, Gong S (2012) Fusing appearance and distribution information of interest points for action recognition. Pattern Recogn 45:1220–1234. https://doi.org/10.1016/j.patcog.2011.08.014
Article Google Scholar
Caetano C, dos Santos JA, Schwartz WR (2016) Optical flow co-occurrence matrices: A novel spatiotemporal feature descriptor. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 1947–1952
Cihan Camgoz N, Hadfield S, Bowden R (2017) Particle filter based probabilistic forced alignment for continuous gesture recognition. In: The IEEE International Conference on Computer Vision (ICCV) Workshops
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), CVPR ’05, p. 886–893. IEEE Computer Society, USA
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Proceedings of the 9th European Conference on Computer Vision, ECCV’06, p. 428–441
Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: International workshop on visual surveillance and performance evaluation of tracking and surveillance. pp. 65–72, https://doi.org/10.1109/VSPETS.2005.1570899
Dosovitskiy A, Fischery P, Ilg E, Hausser P, Hazirbas C, Golkov V, Smagt Pvd, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), ICCV ’15, USA, pp 2758–2766.
Fkih F, Omri MN (2020) Hidden data states-based complex terminology extraction from textual web data model. Appl Intell. https://doi.org/10.1007/s10489-019-01568-4
Forney GD (1973) The Viterbi algorithm. Proc IEEE 61:268–278
Article MathSciNet Google Scholar
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. Trans Pattern Anal Mach Intell 29(12):2247–2253
Article Google Scholar
Horn B, Schunck B (1981) Determining optical flow. Artif Intell 17:185–203
Article Google Scholar
Islam MT, Karim Siddique BMN, Rahman S, Jabid T (2018) Image recognition with deep learning. In: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), vol. 3, pp. 106–110
Ji XF, Wu QQ, Ju Z, Wang YY (2015) Study of human action recognition based on improved spatio-temporal features. Int J Automat Comput 11:500–509
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, Association for Computing Machinery, pp 675–678.
Jiang F, Zhang S, Wu S, Gao Y, Zhao D (2015) Multi-layered gesture recognition with kinect. J Mach Learn Res 16(1):227–254
MathSciNet MATH Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classication with convolutional neural networks. In: Proceedings of International Computer Vision and Pattern Recognition (CVPR 2014)
Khedher MI, El-Yacoubi MA, Dorizzi B (2012) Human action recognition using continuous hmms and hog/hof silhouette representation. In: ICPRAM (2), pp. 503–508. SciTe Press
Kihl O, Picard D, Gosselin PH (2015) A unified framework for local visual descriptors evaluation. Pattern Recogn J 48
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of British Machine Vision Conference, pp. 1–10
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR
Latah M (2017) Human action recognition using support vector machines and 3d convolutional neural networks. Int J Adv Intell Inform 3
Li Y, Miao Q, Tian K, Fan Y, Xu X, Li R, Song J (2016) Large-scale gesture recognition with a fusion of rgb-d data based on the c3d model. pp. 25–30. https://doi.org/10.1109/ICPR.2016.7899602
Lin D, Fu K, Wang Y, Xu G, Sun X (2017) Marta gans: unsupervised representation learning for remote sensing image classification. IEEE Geosci Remote Sens Lett 14(11):2092–2096
Article Google Scholar
Liu Z, Chen Z (2017) Continuous gesture recognition with hand-oriented spatiotemporal feature. https://doi.org/10.1109/ICCVW.2017.361
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence, IJCAI’81, p. 674–679. Morgan Kaufmann Publishers Inc
Ma B, Xu W, Wang S (2013) A robot control system based on gesture recognition using kinect. Telkomnika Indonesian J Elect Eng 11
Miao Q, Li Y, Ouyang W, Ma Z, Xu X, Shi W, Cao X (2017) Multimodal gesture recognition based on the resc3d network. In: The IEEE International Conference on Computer Vision (ICCV) Workshops
Nasiri JA, Moghadam Charkari N, Mozafari K (2014) Energy-based model of least squares twin support vector machines for human action recognition. Signal Process 104:248–257. https://doi.org/10.1016/j.sigpro.2014.04.010
Article Google Scholar
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29:51–59
Article Google Scholar
Petkovic M, Jonker W (2001) Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events. In: Proceedings IEEE Workshop on Detection and Recognition of Events in Video, p. 82. IEEE, United States
Ranjan A, Black M (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. Piscataway, NJ, USA
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2564–2571
Saini R, Kumar P, Kaur B, Roy P, Dogra D, Santosh K (2018) Kinect sensor-based interaction monitoring system using the blstm neural network in healthcare. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-018-0887-5
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: ICPR: Proceedings of the 17th International Conference on Pattern Recognition, vol. 3, pp. 32–36. IEEE
Sharif M, Khan M, Akram T, Javed M, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of euclidean distance and joint entropy-based features selection. EURASIP J Image Video Process. https://doi.org/10.1186/s13640-017-0236-8
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, p. 568–576. MIT Press, MA, USA
Sornam M, Muthusubash K, Vanitha V (2017) A survey on image classification and activity recognition using deep convolutional neural network architecture. In: 2017 Ninth International Conference on Advanced Computing (ICoAC), pp. 121–126
Sun D, Yang X, Liu MY, Kautz J (2018) Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. https://doi.org/10.1109/CVPR.2018.00931
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378. https://doi.org/10.1016/j.patcog.2007.04.009
Article MATH Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV ’15, p. 4489–4497
Tu Z, Xie W, Zhang D, Poppe R, Veltkamp RC, Li B, Yuan J (2019) A survey of variational and cnn-based optical flow techniques. Signal Process Image Commun 72:9–24
Article Google Scholar
Varga M, Jadlovsky J (2019) Evaluation of depth modality in convolutional neural network classification of rgb-d images 18, 26–31. https://doi.org/10.15546/aeei-2018-0029
Wadhawan A, Kumar P (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 32:1–12. https://doi.org/10.1007/s00521-019-04691-y
Article Google Scholar
Wan J, Athitsos V, Jangyodsuk P, Escalante HJ, Ruan Q, Guyon I (2014) Csmmi: class-specific maximization of mutual information for action and gesture recognition. IEEE Trans Image Process 23(7):3152–3165
Article MathSciNet Google Scholar
Wan J, Escalera S, Escalante HJ, Baró X, Guyon I, Allik J, Lin C, Xie Y, Anbarjafari G, Gorbova J (2017) Results and analysis of chalearn lap multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges. https://doi.org/10.1109/ICCVW.2017.377
Wan J, Guo G, Li S (2015) Explore efficient local features from rgb-d data for one-shot learning gesture recognition. IEEE Trans Pattern Anal Mach Intell 38:1–1
Google Scholar
Wan J, Li S, Zhao Y, Zhou S, Guyon I, Escalera S (2016) Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition. pp. 761–769. https://doi.org/10.1109/CVPRW.2016.100
Wan J, Ruan Q, Deng S (2013) One-shot learning gesture recognition from rgb-d data using bag of features. J Mach Learn Res 14:2549–2582
Google Scholar
Wan J, Ruan Q, Li W, An G, Zhao R (2014) 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos. J Electron Imaging 23(2):1–15
Article Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2017) Temporal segment networks for action recognition in videos. IEEE Trans Pattern Anal Mach Intell PP
Wang P, Li W, Liu S, Zhang Y, Gao Z, Ogunbona P (2016) Large-scale continuous gesture recognition using convolutional neural networks. https://doi.org/10.1109/ICPR.2016.7899600
Yang L, Huang J, Feng T, Hong’an W, Guozhong D (2019) Gesture interaction in virtual reality. Virtual Real Intell Hardw 1:9. https://doi.org/10.3724/SP.J.2096-5796.2018.0006
Article Google Scholar
Zhang L, Zhu G, Shen P, Song J (2017) Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. pp. 3120–3128. https://doi.org/10.1109/ICCVW.2017.369
Zhang S, Zhang W, Li Y (2016) Human action recognition based on multifeature. Fusion 405:183–192
Google Scholar
Zhang Z, Hu Y, Chan S, Chia LT (2008) Motion context: a new representation for human action recognition. In: Computer Vision—ECCV, pp. 817–829
Zhou H, Ruan Q (2006) A real-time gesture recognition algorithm on video surveillance. https://doi.org/10.1109/ICOSP.2006.345798
Zhu G, Zhang L, Shen P, Song J (2017) Multimodal gesture recognition using 3-d convolution and convolutional lstm. IEEE Access 5:4517–4524
Article Google Scholar
Zhu G, Zhang L, Shen P, Song J, Shah S, Bennamoun M (2018) Continuous gesture segmentation and recognition using 3dcnn and convolutional lstm. IEEE Trans Multim PP

Download references

Author information

Authors and Affiliations

MARS Research Lab LR 17ES05, Higher Institute of Computer Science and Communication Techniques of Hammem Sousse, University of Sousse, Sousse, Tunisia
Rihem Mahmoud, Selma Belgacem & Mohamed Nazih Omri

Authors

Rihem Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Selma Belgacem
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nazih Omri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rihem Mahmoud.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmoud, R., Belgacem, S. & Omri, M.N. Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos. Int. J. Mach. Learn. & Cyber. 12, 1173–1189 (2021). https://doi.org/10.1007/s13042-020-01227-y

Download citation

Received: 17 April 2020
Accepted: 22 October 2020
Published: 02 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s13042-020-01227-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos

Abstract

Access this article

Similar content being viewed by others

Towards an end-to-end isolated and continuous deep gesture recognition process

Fast and Accurate Gesture Recognition Based on Motion Shapes

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos

Abstract

Access this article

Similar content being viewed by others

Towards an end-to-end isolated and continuous deep gesture recognition process

Fast and Accurate Gesture Recognition Based on Motion Shapes

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation