Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network

Miki, Daisuke; Kamitsuma, Kento; Matsunaga, Taiga

doi:10.1007/s11760-023-02574-3

Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network

Original Paper
Published: 24 April 2023

Volume 17, pages 3505–3513, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Daisuke Miki¹,
Kento Kamitsuma¹^na1 &
Taiga Matsunaga¹^na1

268 Accesses
1 Citation
Explore all metrics

Abstract

Hand gestures play an important role in expressing the emotions of people and communicating their intentions. Therefore, various methods have been studied to clearly capture and understand them. Artificial neural networks (ANNs) are widely used for gesture recognition owing to their expressive power and ease of implementation. However, this task remains challenging because it requires abundant data and energy for computation. Recently, low-power neuromorphic devices that use spiking neural networks (SNNs), which can process temporal information and require lower power consumption for computing, have attracted significant research interest. In this study, we present a method for the spike representation of human hand gestures and analyzing them using SNNs. An SNN comprises multiple convolutional layers; when a sequence of spike trains corresponding to a hand gesture is inputted, the spiking neurons in the output layer corresponding to each gesture fire, and the gesture is classified based on its firing frequency. Using a sequence of depth images of hand gestures, a method to generate spike trains from the training image data was investigated. The gestures could be classified by training the SNN using surrogate gradient (SG) learning. Additionally, by converting the depth image data into spike trains, 68% of the training data volume could be reduced without significantly reducing the classification accuracy, compared to the classification accuracy under ANNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Spiking Rates Inspired Encoder and Decoder for Spiking Neural Networks: An Illustration of Hand Gesture Recognition

Article 31 May 2022

Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction

Article 04 May 2023

A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks

Article 19 May 2015

Data Availability

The data that support the findings of this study are not openly available. Data may be available (http://www-rech.telecom-lille.fr/DHGdataset) upon reasonable request.

References

Guna, J., Jakus, G., Pogačnik, M., Tomažič, S., Sodnik, J.: An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking. Sensors 14(2), 3702–3720 (2014)
Article Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision. 4903–4911 (2017)
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1145–1153 (2017)
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4207–4215 (2016)
Liu, Z., Chai, X., Liu, Z., Chen, X.: Continuous gesture recognition with hand-oriented spatiotemporal feature. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 3056–3064 (2017)
Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented kalman filter with lstm network. Vis. Comput. 34(6), 1053–1063 (2018)
Article Google Scholar
Li, Y., He, Z., Ye, X., He, Z., Han, K.: Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 78, 1–7 (2019)
Google Scholar
Verma, B., Choudhary, A.: Grassmann manifold based dynamic hand gesture recognition using depth data. Multimed. Tools Appl. 79(3), 2213–2237 (2020)
Article Google Scholar
Eshraghian, J.K., Ward, M., Neftci, E., Wang, X., Lenz, G., Dwivedi, G., Bennamoun, M., Jeong, D.S., Lu, W.D.: Training spiking neural networks using lessons from deep learning. arXiv preprint arXiv:2109.12894 (2021)
Levy, W.B., Calvert, V.G.: Computation in the human cerebral cortex uses less than 0.2 watts yet this great expense is optimal when considering communication costs. BioRxiv (2020)
Davies, M., Wild, A., Orchard, G., Sandamirskaya, Y., Guerra, G.A.F., Joshi, P., Plank, P., Risbud, S.R.: Advancing neuromorphic computing with loihi: a survey of results and outlook. Proc. IEEE 109(5), 911–934 (2021)
Article Google Scholar
Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J., Di Nolfo, C., Nayak, T., Andreopoulos, A., Garreau, G., Mendoza, M., et al.: A low power, fully event-based gesture recognition system. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7243–7252 (2017)
DeWolf, T., Jaworski, P., Eliasmith, C.: Nengo and low-power ai hardware for robust, embedded neurorobotics. Front. Neurorobot. 14, 568359 (2020)
Article Google Scholar
Imam, N., Cleland, T.A.: Rapid online learning and robust recall in a neuromorphic olfactory circuit. Nat. Mach. Intel. 2(3), 181–191 (2020)
Article Google Scholar
The Gartner hype cycle (2022) https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle. Accessed 18 Nov 2022
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn. 68, 346–362 (2017)
Article Google Scholar
Verma, B., Choudhary, A.: Dynamic hand gesture recognition using convolutional neural network with rgb-d fusion. In: Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing. 1–8 (2018)
Bhaumik, G., Verma, M., Govil, M.C., Vipparthi, S.K.: Extridenet: an intensive feature extrication deep network for hand gesture recognition. The Visual Computer 1–14 (2021)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1110–1118 (2015)
Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017)
Article MathSciNet MATH Google Scholar
Liu, J., Shahroudy, A., Xu, D., Kot, A.C., Wang, G.: Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2017)
Article Google Scholar
Nguyen, X.S., Brun, L., Lézoray, O., Bougleux, S.: Learning recurrent high-order statistics for skeleton-based hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE 975–982 (2021)
Verma, B.: A two stream convolutional neural network with bi-directional gru model to classify dynamic hand gesture. J. Vis. Commun. Image Represent. 87, 103554 (2022)
Article Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence. (2018)
Li, B., Li, X., Zhang, Z., Wu, F.: Spatio-temporal graph routing for skeleton-based action recognition. Proc. AAAI Conf. Artif. Intell. 33, 8561–8568 (2019)
Google Scholar
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1227–1236 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12026–12035 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7912–7921 (2019)
Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia. 1057–1060 (2012)
Oreifej, O., Liu, Z.: Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 716–723 (2013)
Verma, B., Choudhary, A.: Framework for dynamic hand gesture recognition using grassmann manifold for intelligent vehicles. IET Intel. Transp. Syst. 12(7), 721–729 (2018)
Article Google Scholar
Kong, Y., Satarboroujeni, B., Fu, Y.: Learning hierarchical 3d kernel descriptors for rgb-d action recognition. Comput. Vis. Image Underst. 144, 14–23 (2016)
Article Google Scholar
Wang, P., Li, W., Liu, S., Zhang, Y., Gao, Z., Ogunbona, P.: Large-scale continuous gesture recognition using convolutional neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE 13–18 (2016)
Wu, J., Ishwar, P., Konrad, J.: Two-stream cnns for gesture-based verification and identification: Learning user style. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 42–50 (2016)
Jain, R., Karsh, R.K., Barbhuiya, A.A.: Encoded motion image-based dynamic hand gesture recognition. Vis. Comput. 38(6), 1957–1974 (2022)
Article Google Scholar
Diehl, P.U., Cook, M.: Unsupervised learning of digit recognition using spike-timing dependent plasticity. Front. Comput. Neurosci. 9, 99 (2015)
Article Google Scholar
Shrestha, S.B., Orchard, G.: Slayer: Spike layer error reassignment in time. Adv. Neural Inf. Process. Syst. 31, (2018)
Xing, Y., Di Caterina, G., Soraghan, J.: A new spiking convolutional recurrent neural network (scrnn) with applications to event-based hand gesture recognition. Front. Neurosci. 14, 1143 (2020)
Article Google Scholar
Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks: bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process. Mag. 36(6), 51–63 (2019)
Article Google Scholar
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., Tian, Y.: Incorporating learnable membrane time constant to enhance learning of spiking neural networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2661–2671 (2021)
Kaiser, J., Tieck, V., Hubschneider, C., Wolf, P., Weber, M., Hoff, M., Friedrich, A. Wojtasik, K., Roennau, A., Kohlhaas, R., Dillmann, R., Zöllener, M.:Towards a framework for end-to-end control of a simulated vehicle with spiking neural networks, In: 2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR), 127–134 (2016)
Bi, Y., Andreopoulos, Y.: PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams, In: 2017 IEEE International Conference on Image Processing (ICIP) 1990–1994 (2017)
Gehrig, D., Gehrig, M., Hidalgo-Carrió, J., Scaramuzza, D.: Video to events: Recycling video datasets for event cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 3586–3595 (2020)
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–9, (2016)
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28, (2015)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8024–8035 (2019)
Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 24, (2011)
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge discovery and data mining. 2623–2631 (2019)

Download references

Funding

This work was supported by Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research Grant Numbers 22K17937.

Author information

Kento Kamitsuma, Taiga Matsunaga have contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Chiba Institute of Technology, 2-17-1, Tsudanuma, Narashino, Chiba, 2750016, Japan
Daisuke Miki, Kento Kamitsuma & Taiga Matsunaga

Authors

Daisuke Miki
View author publications
You can also search for this author in PubMed Google Scholar
Kento Kamitsuma
View author publications
You can also search for this author in PubMed Google Scholar
Taiga Matsunaga
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Daisuke Miki wrote the main manuscript text and Kento Kamitsuma and Taiga Matsunaga prepared Tables 1, 2, 3, 4. All authors reviewed the manuscript.

Corresponding author

Correspondence to Daisuke Miki.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Miki, D., Kamitsuma, K. & Matsunaga, T. Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network. SIViP 17, 3505–3513 (2023). https://doi.org/10.1007/s11760-023-02574-3

Download citation

Received: 18 November 2022
Revised: 27 January 2023
Accepted: 23 March 2023
Published: 24 April 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11760-023-02574-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network

Abstract

Access this article

Similar content being viewed by others

The Spiking Rates Inspired Encoder and Decoder for Spiking Neural Networks: An Illustration of Hand Gesture Recognition

Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction

A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spike representation of depth image sequences and its application to hand gesture recognition with spiking neural network

Abstract

Access this article

Similar content being viewed by others

The Spiking Rates Inspired Encoder and Decoder for Spiking Neural Networks: An Illustration of Hand Gesture Recognition

Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction

A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation