Skip to main content

Advertisement

Log in

Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Hand gestures play an important role in expressing the emotions of people and communicating their intentions. Therefore, various methods have been studied to clearly capture and understand them. Artificial neural networks (ANNs) are widely used for gesture recognition owing to their expressive power and ease of implementation. However, this task remains challenging because it requires abundant data and energy for computation. Recently, low-power neuromorphic devices that use spiking neural networks (SNNs), which can process temporal information and require lower power consumption for computing, have attracted significant research interest. In this study, we present a method for the spike representation of human hand gestures and analyzing them using SNNs. An SNN comprises multiple convolutional layers; when a sequence of spike trains corresponding to a hand gesture is inputted, the spiking neurons in the output layer corresponding to each gesture fire, and the gesture is classified based on its firing frequency. Using a sequence of depth images of hand gestures, a method to generate spike trains from the training image data was investigated. The gestures could be classified by training the SNN using surrogate gradient (SG) learning. Additionally, by converting the depth image data into spike trains, 68% of the training data volume could be reduced without significantly reducing the classification accuracy, compared to the classification accuracy under ANNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The data that support the findings of this study are not openly available. Data may be available (http://www-rech.telecom-lille.fr/DHGdataset) upon reasonable request.

References

  1. Guna, J., Jakus, G., Pogačnik, M., Tomažič, S., Sodnik, J.: An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking. Sensors 14(2), 3702–3720 (2014)

    Article  Google Scholar 

  2. Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision. 4903–4911 (2017)

  3. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1145–1153 (2017)

  4. Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4207–4215 (2016)

  5. Liu, Z., Chai, X., Liu, Z., Chen, X.: Continuous gesture recognition with hand-oriented spatiotemporal feature. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 3056–3064 (2017)

  6. Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented kalman filter with lstm network. Vis. Comput. 34(6), 1053–1063 (2018)

    Article  Google Scholar 

  7. Li, Y., He, Z., Ye, X., He, Z., Han, K.: Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 78, 1–7 (2019)

    Google Scholar 

  8. Verma, B., Choudhary, A.: Grassmann manifold based dynamic hand gesture recognition using depth data. Multimed. Tools Appl. 79(3), 2213–2237 (2020)

    Article  Google Scholar 

  9. Eshraghian, J.K., Ward, M., Neftci, E., Wang, X., Lenz, G., Dwivedi, G., Bennamoun, M., Jeong, D.S., Lu, W.D.: Training spiking neural networks using lessons from deep learning. arXiv preprint arXiv:2109.12894 (2021)

  10. Levy, W.B., Calvert, V.G.: Computation in the human cerebral cortex uses less than 0.2 watts yet this great expense is optimal when considering communication costs. BioRxiv (2020)

  11. Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G.A.F., Joshi, P., Plank, P., Risbud, S.R.: Advancing neuromorphic computing with loihi: a survey of results and outlook. Proc. IEEE 109(5), 911–934 (2021)

    Article  Google Scholar 

  12. Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Di Nolfo, C., Nayak, T., Andreopoulos, A., Garreau, G., Mendoza, M., et al.: A low power, fully event-based gesture recognition system. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7243–7252 (2017)

  13. DeWolf, T., Jaworski, P., Eliasmith, C.: Nengo and low-power ai hardware for robust, embedded neurorobotics. Front. Neurorobot. 14, 568359 (2020)

    Article  Google Scholar 

  14. Imam, N., Cleland, T.A.: Rapid online learning and robust recall in a neuromorphic olfactory circuit. Nat. Mach. Intel. 2(3), 181–191 (2020)

    Article  Google Scholar 

  15. The Gartner hype cycle (2022) https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle. Accessed 18 Nov 2022

  16. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)

    Article  Google Scholar 

  17. Verma, B., Choudhary, A.: Dynamic hand gesture recognition using convolutional neural network with rgb-d fusion. In: Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing. 1–8 (2018)

  18. Bhaumik, G., Verma, M., Govil, M.C., Vipparthi, S.K.: Extridenet: an intensive feature extrication deep network for hand gesture recognition. The Visual Computer 1–14 (2021)

  19. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1110–1118 (2015)

  20. Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  21. Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2017)

    Article  Google Scholar 

  22. Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: Learning recurrent high-order statistics for skeleton-based hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE 975–982 (2021)

  23. Verma, B.: A two stream convolutional neural network with bi-directional gru model to classify dynamic hand gesture. J. Vis. Commun. Image Represent. 87, 103554 (2022)

    Article  Google Scholar 

  24. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence. (2018)

  25. Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. Proc. AAAI Conf. Artif. Intell. 33, 8561–8568 (2019)

    Google Scholar 

  26. Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1227–1236 (2019)

  27. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12026–12035 (2019)

  28. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7912–7921 (2019)

  29. Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia. 1057–1060 (2012)

  30. Oreifej, O., Liu, Z.: Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716–723 (2013)

  31. Verma, B., Choudhary, A.: Framework for dynamic hand gesture recognition using grassmann manifold for intelligent vehicles. IET Intel. Transp. Syst. 12(7), 721–729 (2018)

    Article  Google Scholar 

  32. Kong, Y., Satarboroujeni, B., Fu, Y.: Learning hierarchical 3d kernel descriptors for rgb-d action recognition. Comput. Vis. Image Underst. 144, 14–23 (2016)

    Article  Google Scholar 

  33. Wang, P., Li, W., Liu, S., Zhang, Y., Gao, Z., Ogunbona, P.: Large-scale continuous gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE 13–18 (2016)

  34. Wu, J., Ishwar, P., Konrad, J.: Two-stream cnns for gesture-based verification and identification: Learning user style. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 42–50 (2016)

  35. Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)

    Article  Google Scholar 

  36. Diehl, P.U., Cook, M.: Unsupervised learning of digit recognition using spike-timing dependent plasticity. Front. Comput. Neurosci. 9, 99 (2015)

    Article  Google Scholar 

  37. Shrestha, S.B., Orchard, G.: Slayer: Spike layer error reassignment in time. Adv. Neural Inf. Process. Syst. 31, (2018)

  38. Xing, Y., Di Caterina, G., Soraghan, J.: A new spiking convolutional recurrent neural network (scrnn) with applications to event-based hand gesture recognition. Front. Neurosci. 14, 1143 (2020)

    Article  Google Scholar 

  39. Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks: bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process. Mag. 36(6), 51–63 (2019)

    Article  Google Scholar 

  40. Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., Tian, Y.: Incorporating learnable membrane time constant to enhance learning of spiking neural networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2661–2671 (2021)

  41. Kaiser, J., Tieck, V., Hubschneider, C., Wolf, P., Weber, M., Hoff, M., Friedrich, A. Wojtasik, K., Roennau, A., Kohlhaas, R., Dillmann, R., Zöllener, M.:Towards a framework for end-to-end control of a simulated vehicle with spiking neural networks, In: 2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), 127–134 (2016)

  42. Bi, Y., Andreopoulos, Y.: PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams, In: 2017 IEEE International Conference on Image Processing (ICIP) 1990–1994 (2017)

  43. Gehrig, D., Gehrig, M., Hidalgo-Carrió, J., Scaramuzza, D.: Video to events: Recycling video datasets for event cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 3586–3595 (2020)

  44. De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–9, (2016)

  45. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28, (2015)

  46. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8024–8035 (2019)

    Google Scholar 

  47. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 24, (2011)

  48. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge discovery and data mining. 2623–2631 (2019)

Download references

Funding

This work was supported by Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research Grant Numbers 22K17937.

Author information

Authors and Affiliations

Authors

Contributions

Daisuke Miki wrote the main manuscript text and Kento Kamitsuma and Taiga Matsunaga prepared Tables 1, 2, 3, 4. All authors reviewed the manuscript.

Corresponding author

Correspondence to Daisuke Miki.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Miki, D., Kamitsuma, K. & Matsunaga, T. Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network. SIViP 17, 3505–3513 (2023). https://doi.org/10.1007/s11760-023-02574-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02574-3

Keywords

Navigation