Abstract
Apart from verbal communication among humans, non-verbal interactions also play a significant role in conveying meaningful information. Non-verbal cues mainly comprise gestures, body postures, and facial expressions. Hand gestures constitute the preferred mechanism for non-verbal communication, and today, they also find utility in human–computer interaction (HCI), gaming, virtual reality, robotics, sign language, etc. While extensive research has been conducted on utilizing deep learning for hand gesture recognition, there has been a notable scarcity of efforts focused on leveraging the sparse characteristics of deeply acquired features to distinguish hand postures, even in the presence of challenges such as varying hand sizes, diverse spatial positions within images, and background clutter. We demonstrate the effect of data augmentation, transfer learning, and sparsity on the performance of the proposed algorithm using publicly available hand gesture datasets. We also provide a quantitative comparative analysis of the proposed approach with state-of-the-art algorithms for static hand gesture recognition. We illustrate a noteworthy finding wherein dictionary learning through LC-KSVD, when applied to fine-tuned features extracted from a deep architecture, outperforms the results achieved by state-of-the-art architectures in the context of hand gesture classification. We have realized substantial enhancements with our proposed methodology when compared to a baseline convolutional model. For instance, in the case of the EgoGesture dataset, we attained an accuracy of \(94.9\%\), as opposed to the baseline accuracy of \(63.3\%\), through the utilization of sparsity in deep features.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mohanty, A., Rambhatla, S.S., Sahay, R.R.: Deep gesture: static hand gesture recognition using cnn. In: Proceedings of International Conference on Computer Vision and Image Processing, pp. 449–461, Springer (2017)
Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: Pcanet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Kumar, P.P., Vadakkepat, P., Loh, A.P.: Hand posture and face recognition using a fuzzy-rough approach. Int. J. Humanoid Rob. 7(3), 331–356 (2010)
Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)
Kumar, P.P., Vadakkepat, P., Poh, L.A.: Microstructure and its effect on toughness and wear resistance of laser surface melted and post heat treated high speed steel. In: 2010 11th International Conference on Control Automation Robotics Vision, pp. 1151–1156 (2010)
El-Sawah, A., Georganas, N.D., Petriu, E.M.: A prototype for 3-d hand tracking and posture estimation. IEEE Trans. Instrum. Meas. 57(8), 1627–1636 (2008)
Teng, X., Wu, B., Yu, W., Liu, C.: A hand gesture recognition system based on local linear embedding. J. Vis. Lang. Comput. 16(5), 442–454 (2005)
Ge, S.S., Yang, Y., Lee, T.H.: Hand gesture recognition and tracking based on distributed locally linear embedding. Image Vis. Comput. 26(12), 1607–1620 (2008)
Lades, M., Vorbruggen, J.C., Buhmann, J., Lange, J., Von Der Malsburg, C., Wurtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42(3), 300–311 (1993)
Triesch, J., Von Der Malsburg, C.: A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans. Patt. Anal. Mach. Intell. 23(12), 1449–1453 (2001)
Triesch, J., von der Malsburg, C.: Robust classification of hand postures against complex backgrounds, pp. 170–175 (1996)
Triesch, J., Von Der Malsburg, C.: A gesture interface for human-robot-interaction. In: Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference On, pp. 546–551, IEEE (1998)
Li, Y.-T., Wachs, J.P.: Hierarchical elastic graph matching for hand gesture recognition. In: Iberoamerican Congress on Pattern Recognition, pp. 308–315, Springer (2012)
Wiskott, L., Krüger, N., Kuiger, N., Von Der Malsburg, C.: Face recognition by elastic bunch graph matching. IEEE Trans. Patt. Anal. Mach. Intell. 19(7), 775–779 (1997)
Ueda, E., Matsumoto, Y., Imai, M., Ogasawara, T.: A hand-pose estimation for vision-based human interfaces. IEEE Trans. Industr. Electron. 50(4), 676–684 (2003)
Yin, X., Xie, M.: Estimation of the fundamental matrix from uncalibrated stereo hand images for 3d hand gesture recognition. Patt. Recogn. 36(3), 567–584 (2003)
Keskin, C., Kiraç, F., Kara, Y.E., Akarun, L.: Randomized Decision Forests for Static and Dynamic Hand Shape Classification, pp. 31–36 (2012)
Kim, S.Y., Han, H.G., Kim, J.W., Lee, S., Kim, T.W.: A hand gesture recognition sensor using reflected impulses. IEEE Sens. J. 17(10), 2975–2976 (2017)
Xie, R., Cao, J.: Accelerometer-Based Hand Gesture Recognition by Neural Network and Similarity Matching. PhD thesis (2016)
Lu, W., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion controller. IEEE Sign. Process. Lett. 23(9), 1188–1192 (2016)
Yang, C., Ku, B., Han, D.K., Ko, H.: Alpha-numeric hand gesture recognition based on fusion of spatial feature modelling and temporal feature modelling. Electron. Lett. 52(20), 1679–1681 (2016)
Li, G., Zhang, R., Ritchie, M., Griffiths, H.: Sparsity-Based Dynamic Hand Gesture Recognition Using Micro-Doppler Signatures, pp. 0928–0931 (2017)
Sang, Y., Shi, L., Liu, Y.: Micro hand gesture recognition system using ultrasonic active sensing. arXiv preprint arXiv:1712.00216 (2017)
Padhy, S.: A tensor-based approach using multilinear SVD for hand gesture recognition from SEMG signals. IEEE Sens. J. 21(5), 6634–6642 (2020)
Jaramillo-Yánez, A., Benalcázar, M.E., Mena-Maldonado, E.: Real-time hand gesture recognition using surface electromyography and machine learning: a systematic literature review. Sensors 20(9), 2467 (2020)
Oudah, M., Al-Naji, A., Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imag. 6(8), 73 (2020)
Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. 164, 113794 (2021)
Marcel, S., Bernier, O.: Hand posture recognition in a body-face centered space. In: Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human–Computer Interaction, pp. 97–100, Springer (1999)
Chen, D., Li, G., Sun, Y., Kong, J., Jiang, G., Tang, H., Ju, Z., Yu, H., Liu, H.: An interactive image segmentation method in hand gesture recognition. Sensors 17(2), 253 (2017)
Ge, C., Gu, I.Y.-H., Yang, J.: Human fall detection using segment-level CNN features and sparse dictionary learning. In: Machine Learning for Signal Processing (MLSP), 2017 IEEE 27th International Workshop On, pp. 1–6, IEEE (2017)
Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5761–5770 (2020)
Barczak, A., Reyes, N., Abastillas, M., Piccio, A., Susnjak, T.: A new 2d static hand gesture colour image dataset for asl gestures. Res. Lett. Inf. Math. Sci. 15, 12–20 (2011)
Kawulok, M., Kawulok, J., Nalepa, J.: Spatial-based skin detection using discriminative skin-presence features. Patt. Recognit. Lett. 41, 3–13 (2014)
Kawulok, M.: Fast Propagation-based Skin Regions Segmentation in Color Images, pp. 1–7 (2013)
Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist Localization in Color Images for Hand Gesture Recognition, pp. 79–86 (2014)
Garcia, B., Viesca, S.A.: Real-time american sign language recognition with convolutional neural networks. Convolut. Neural Netw. Vis. Recognit. 2 (2016)
Kendon, A., Nespoulous, J.: The Biological Foundations of Gestures: Motor and Semiotic Aspects. Lawrence Erlbaum Associates, Hillsday (1986)
Mohanty, A., Roy, K., Sahay, R.R.: Nrityamanthan: unravelling the intent of the dancer using deep learning. Herit. Preservation: Comput. Approach (2018). https://doi.org/10.1007/978-981-10-7221-5_11
Gupta, P., Kautz, K., : Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 3 (2016)
Zhang, Y., Cao, C., Cheng, J., Lu, H.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. 20(5), 1038–1050 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
LeCun, Y., Huang, F.J., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting, vol. 2, pp. 97–1042 (2004)
Mohanty, A., Vaishnavi, P., Jana, P., Majumdar, A., Ahmed, A., Goswami, T., Sahay, R.R.: Nrityabodha: towards understanding Indian classical dance using a deep learning approach. Sign. Process.: Image Commun. 47, 529–548 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images 1 (2009)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555 (2018)
Materzynska, J., Berger, G., Bax, I., Memisevic, R.: The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–8, IEEE (2019)
Candès, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Sign. Process. Mag. 25(2), 21–30 (2008)
Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.: J. Issu. Courant Inst. Math. Sci. 59(8), 1207–1223 (2006)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online Dictionary Learning for Sparse Coding, pp. 689–696 (2009)
Aharon, M., Elad, M., Bruckstein, A.: \(rm k\)-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311–4322 (2006)
Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2691–2698, IEEE (2010)
Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans. Patt. Anal. Mach. Intell. 35(11), 2651–2664 (2013)
Davis, G., Mallat, S., Avellaneda, M.: Adaptive greedy approximations. Constr. Approx. 13(1), 57–98 (1997)
Mallat, S.G., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Sign. Process. 41(12), 3397–3415 (1993)
Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to non-linear system identification. Int. J. Control 50(5), 1873–1896 (1989)
Davis, G.M., Mallat, S.G., Zhang, Z.: Adaptive time–frequency decompositions. Opt. Eng. 33(7), 2183–2192 (1994)
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference On, pp. 40–44, IEEE (1993)
Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)
Elad, M., Starck, J.-L., Querre, P., Donoho, D.L.: Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal. 19(3), 340–358 (2005)
Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Elad, M., Aharon, M.: Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 210–227 (2009)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Sign. Process. 54(11), 4311 (2006)
Kviatkovsky, I., Gabel, M., Rivlin, E., Shimshoni, I.: On the equivalence of the LC-KSVD and the D-KSVD algorithms. IEEE Trans. Patt. Anal. Mach. Intell. 39(2), 411–416 (2017)
Vedaldi, A., Lenc, K.: Matconvnet: convolutional neural networks for MATLAB. CoRR (2014) arXiv:1412.4564
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database, pp. 248–255 (2009)
Roy, K., Mohanty, A., Sahay, R.R.: Deep learning based hand detection in cluttered environment using skin segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 640–649 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014) arXiv:1409.1556
Chao, Y.-W., Yang, W., Xiang, Y., Molchanov, P., Handa, A., Tremblay, J., Narang, Y.S., Van Wyk, K., Iqbal, U., Birchfield, S., Kautz, J., Fox, D.: DexYCB: A benchmark for capturing hand grasping of objects. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: a convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The work is primarily done at the Indian Institute of Technology, Kharagpur, India, as part of the research thesis and is not funded by any external agencies.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mohanty, A., Roy, K. & Sahay, R.R. Robust static hand gesture recognition: harnessing sparsity of deeply learned features. Vis Comput 40, 6507–6531 (2024). https://doi.org/10.1007/s00371-023-03179-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03179-0