Abstract
Non-intrusive visual-based applications supporting the communication of people employing sign language for communication are always an open and attractive research field for the human action recognition community. Automatic sign language interpretation is a complex visual recognition task where motion across time distinguishes the sign being performed. In recent years, the development of robust and successful deep-learning techniques has been accompanied by the creation of a large number of databases. The availability of challenging datasets of Sign Language (SL) terms and phrases helps to push the research to develop new algorithms and methods to tackle their automatic recognition. This paper presents ‘SL-Animals-DVS’, an event-based action dataset captured by a Dynamic Vision Sensor (DVS). The DVS records non-fluent signers performing a small set of isolated words derived from SL signs of various animals as a continuous spike flow at very low latency. This is especially suited for SL signs which are usually made at very high speeds. We benchmark the recognition performance on this data using three state-of-the-art Spiking Neural Networks (SNN) recognition systems. SNNs are naturally compatible to make use of the temporal information that is provided by the DVS where the information is encoded in the spike times. The dataset has about 1100 samples of 59 subjects performing 19 sign language signs in isolation at different scenarios, providing a challenging evaluation platform for this emerging technology.
Similar content being viewed by others
Notes
Because we are interested in the analysis of a stream of temporal events, we will not consider static images.
References
DECOLLE implementetion code. https://github.com/nmi-lab/decolle-public. Accessed 09 July 2021
jaer open source project: Real time sensory-motor processing for event-based sensors and systems. http://www.jaerproject.org. Accessed: 09 July 2021
SL-Animals-DVS dataset. http://www2.imse-cnm.csic.es/neuromorphs/index.php/SL-ANIMALS-DVS-Database. Accessed: 09 July 2021
Amaral L, Júnior GL, Vieira T, Vieira T (2018) Evaluating deep models for dynamic Brazilian sign language recognition. In: Iberoamerican congress on pattern recognition, pp. 930–937. Springer. https://doi.org/10.1007/978-3-030-13469-3_107
Amir A, Taba B, Berg D, Melano T, McKinstry J, Di Nolfo C, Nayak T, Andreopoulos A, Garreau G, Mendoza M et al (2017) A low power, fully event-based gesture recognition system. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7243–7252 (2017). https://doi.org/10.1109/CVPR.2017.781
Baranwal N, Nandi GC (2017) An efficient gesture based humanoid learning using wavelet descriptor and mfcc techniques. Int J Mach Learn Cybern 8(4):1369–1388. https://doi.org/10.1007/s13042-016-0512-4
Bellugi U, Klima E (2001) Sign language. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social and behavioral sciences, pp. 14066–14071. Pergamon, Oxford. https://doi.org/10.1016/B0-08-043076-7/02940-5
Camuñas-Mesa LA, Linares-Barranco B, Serrano-Gotarredona T (2019) Neuromorphic spiking neural networks and their memristor-cmos hardware implementations. Materials 12(17):2745. https://doi.org/10.3390/ma12172745
Canales E (2021) iAPRENDE A SIGNAR! LSE (20 Animales). https://www.youtube.com/watch?v=IRue9cRhsDk. Accessed: 9 July
Caselli NK, Sehyr ZS, Cohen-Goldberg AM, Emmorey K (2017) ASL-LEX: a lexical database of American sign language. Behav Res Methods 49(2):784–801. https://doi.org/10.3758/s13428-016-0742-0
Cerna LR, Cardenas EE, Miranda DG, Menotti D, Camara-Chavez G (2021) A multimodal libras-ufop Brazilian sign language dataset of minimal pairs using a microsoft kinect sensor. Exp Syst Appl 167:114179. https://doi.org/10.1016/j.eswa.2020.114179
Chen G, Chen J, Lienen M, Conradt J, Röhrbein F, Knoll AC (2019) FLGR: fixed length GISTS representation learning for RNN-HMM hybrid-based neuromorphic continuous gesture recognition. Front Neurosci 13:73
Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern 10(1):131–153. https://doi.org/10.1007/s13042-017-0705-5
Corina D (2001) Sign language: psychological and neural aspects. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social and behavioral sciences, pp. 14071–14075. Pergamon, Oxford. https://doi.org/10.1016/B0-08-043076-7/03492-6
Dreuw P, Neidle C, Athitsos V, Sclaroff S, Ney H (2008) Benchmark databases for video-based automatic sign language recognition. In: LREC
Emmorey K, Corina D (1990) Lexical recognition in sign language: effects of phonetic structure and morphology. Percept Motor Skills 71(3\_suppl), 1227–1252. https://doi.org/10.2466/pms.1990.71.3f.1227
Eryilmaz SB, Joshi S, Neftci E, Wan W, Cauwenberghs G, Wong HSP (2016) Neuromorphic architectures with electronic synapses. In: 17th international symposium on quality electronic design (ISQED), pp. 118–123. https://doi.org/10.1109/ISQED.2016.7479186
Escalera S, Baró X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2014) Chalearn looking at people challenge 2014: dataset and results. In: European conference on computer vision, pp. 459–473. Springer. https://doi.org/10.1007/978-3-319-16178-5_32
Forster J, Schmidt C, Koller O, Bellgardt M, Ney H (2014) Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-Weather. In: International conference on language resources and evaluation, pp. 1911–1916
Gerstner W, Kistler WM (2002) Spiking neuron models: single neurons, populations, plasticity. Cambridge University Press, Cambridge
Kaiser J, Mostafa H, Neftci E (2020) Synaptic plasticity dynamics for deep continuous local learning (decolle). Front Neurosci 14:424. https://doi.org/10.3389/fnins.2020.00424
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Li D, Chen X, Becchi M, Zong Z (2016) Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In: IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom), pp. 477–484. https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.76
Liang ZJ, Liao SB, Hu BZ (2018) 3D convolutional neural networks for dynamic sign language recognition. Comput J 61(11):1724–1736. https://doi.org/10.1093/comjnl/bxy049
Lichtsteiner P, Posch C, Delbruck T (2006) A 128*128 120db 15us latency asynchronous temporal contrast vision sensor. pp. 566–576. https://doi.org/10.1109/JSSC.2007.914337
Lungu IA, Corradi F, Delbrück T (2017) Live demonstration: convolutional neural network driven by dynamic vision sensor playing RoShamBo. In: IEEE international symposium on circuits and systems (ISCAS), p 1 . https://doi.org/10.1109/ISCAS.2017.8050403
Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw 10(9):1659–1671. https://doi.org/10.1016/S0893-6080(97)00011-7
Maro JM, Ieng SH, Benosman R (2020) Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Front Neurosci 14:275
Martínez AM, Wilbur RB, Shay R, Kak AC (2002) Purdue RVL-SLLL ASL database for automatic recognition of American sign language. In: IEEE international conference on multimodal interfaces, pp 167–172. https://doi.org/10.1109/ICMI.2002.1166987
McLeister M (2019) Worship, technology and identity: a deaf protestant congregation in urban China. Stud World Christ 25(2):220–237
Merolla PA, Arthur JV, Alvarez-Icaza R, Cassidy AS, Sawada J, Akopyan F, Jackson BL, Imam N, Guo C, Nakamura Y et al (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197):668–673. https://doi.org/10.1126/science.1254642
Mori Y, Toyonaga M (2018) Data-glove for Japanese sign language training system with gyro-sensor. In: Joint 10th international conference on soft computing and intelligent systems (SCIS) and 19th international symposium on advanced intelligent systems (ISIS), pp. 1354–1357. https://doi.org/10.1007/s13042-017-0705-5
Pérez-Carrasco JA, Zhao B, Serrano C, Acha B, Serrano-Gotarredona T, Chen S, Linares-Barranco B (2013) Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing-application to feedforward ConvNets. IEEE Trans Pattern Anal Mach Intell 35(11):2706–2719. https://doi.org/10.1109/TPAMI.2013.71
Posch C, Matolin D, Wohlgenannt R (2010) A QVGA 143 db dynamic range frame-free PWM image sensor with lossless pixel-level video compression and time-domain CDS. IEEE J Solid-State Circuits 46(1):259–275. https://doi.org/10.1109/JSSC.2010.2085952
Serrano-Gotarredona T, Linares-Barranco B (2013) A 128x128 1.5% contrast sensitivity 0.9% fpn 3us latency 4 mw asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers. IEEE J Solid State Circuits 48(3):827–838
Shrestha SB, Orchard G (2018) SLAYER: spike layer error reassignment in time. In: Advances in neural information processing systems, pp 1412–1421
Sivilotti MA (1991) Wiring considerations in analog vlsi systems, with application to field-programmable networks. Ph.D. thesis, Computation and Neural Systems, California Inst. Technol., Pasadena, CA, USA
Troelsgård T, Kristoffersen JH (2008) An electronic dictionary of Danish sign language. In: Theoretical issues in sign language research conference, Florianopolis, Brazil
Upadhyay NK, Jiang H, Wang Z, Asapu S, Xia Q, Joshua Yang J (2019) Emerging memory devices for neuromorphic computing. Adv Mater Technol 4(4):1800589. https://doi.org/10.1002/admt.201800589
Vasudevan A, Negri P, Linares-Barranco B, Serrano-Gotarredona T (2020) Introduction and analysis of an event-based sign language dataset. In: Faces and gestures in E-health and welfare (FaGEW) workshop, 15th IEEE international conference on automatic face and gesture recognition (FG), pp 441–448
Von Agris U, Kraiss KF (2007) Towards a video corpus for signer-independent continuous sign language recognition
Wan J, Zhao Y, Zhou S, Guyon I, Escalera S, Li SZ (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 56–64 https://doi.org/10.1109/CVPRW.2016.100
Wang H, Chai X, Hong X, Zhao G, Chen X (2016) Isolated sign language recognition with Grassmann covariance matrices. ACM Trans Access Comput (TACCESS) 8(4):1–21. https://doi.org/10.1145/2897735
Wang Q, Zhang Y, Yuan J, Lu Y (2019) Space-time event clouds for gesture recognition: from rgb cameras to event cameras. In: IEEE winter conference on applications of computer vision (WACV), pp 1826–1835. https://doi.org/10.1109/WACV.2019.00199
Wang X, Lin X, Dang X (2019) A delay learning algorithm based on spike train kernels for spiking neurons. Front Neurosci 13:252. https://doi.org/10.3389/fnins.2019.00252
World Health Organization: Deafness and hearing loss (2019). https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss. Accessed 09 July 2021
Wu Y, Deng L, Li G, Zhu J, Shi L (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Front Neurosci 12:331. https://doi.org/10.3389/fnins.2018.00331
Yousefzadeh A, Khoei MA, Hosseini S, Holanda P, Leroux S, Moreira O, Tapson J, Dhoedt B, Simoens P, Serrano-Gotarredona T et al (2019) Asynchronous spiking neurons, the natural key to exploit temporal sparsity. IEEE J Emerg Sel Top Circuits Syst 9(4):668–678. https://doi.org/10.1109/JETCAS.2019.2951121
Yuan T, Sah S, Ananthanarayana T, Zhang C, Bhat A, Gandhi S, Ptucha R (2019) Large scale sign language interpretation. In: 14th IEEE international conference on automatic face and gesture recognition (FG), pp 1–5. https://doi.org/10.1109/FG.2019.8756506
Acknowledgements
This work was funded by EU H2020 grants 824164 “HERMES”, 871371 “Memscales” and 871501 “NeuroNN”, Spanish grant from the Ministry of Economy and Competitivity NANOMIND-PID2019-105556GB-C31, with support from the European Regional Development Fund. P. Negri was partially supported by the Scholarship Program Mobility of the Postgraduate Iberoamerican University Association (AUIP). C. Di Ielsi was supported by a scholarship of the Computer Department of the University of Buenos Aires. A. Vasudevan was supported by the MINECO FPI scholarship BES-2016-077757.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Vasudevan, A., Negri, P., Di Ielsi, C. et al. SL-Animals-DVS: event-driven sign language animals dataset . Pattern Anal Applic 25, 505–520 (2022). https://doi.org/10.1007/s10044-021-01011-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01011-w