SL-Animals-DVS: event-driven sign language animals dataset

Vasudevan, Ajay; Negri, Pablo; Di Ielsi, Camila; Linares-Barranco, Bernabe; Serrano-Gotarredona, Teresa

doi:10.1007/s10044-021-01011-w

SL-Animals-DVS: event-driven sign language animals dataset

Original Article
Published: 16 July 2021

Volume 25, pages 505–520, (2022)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Ajay Vasudevan¹,
Pablo Negri ORCID: orcid.org/0000-0003-0250-5208^2,3,
Camila Di Ielsi²,
Bernabe Linares-Barranco¹ &
…
Teresa Serrano-Gotarredona¹

773 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Non-intrusive visual-based applications supporting the communication of people employing sign language for communication are always an open and attractive research field for the human action recognition community. Automatic sign language interpretation is a complex visual recognition task where motion across time distinguishes the sign being performed. In recent years, the development of robust and successful deep-learning techniques has been accompanied by the creation of a large number of databases. The availability of challenging datasets of Sign Language (SL) terms and phrases helps to push the research to develop new algorithms and methods to tackle their automatic recognition. This paper presents ‘SL-Animals-DVS’, an event-based action dataset captured by a Dynamic Vision Sensor (DVS). The DVS records non-fluent signers performing a small set of isolated words derived from SL signs of various animals as a continuous spike flow at very low latency. This is especially suited for SL signs which are usually made at very high speeds. We benchmark the recognition performance on this data using three state-of-the-art Spiking Neural Networks (SNN) recognition systems. SNNs are naturally compatible to make use of the temporal information that is provided by the DVS where the information is encoded in the spike times. The dataset has about 1100 samples of 59 subjects performing 19 sign language signs in isolation at different scenarios, providing a challenging evaluation platform for this emerging technology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Korean Sign Language Dataset for Action Recognition

Event-Based American Sign Language Recognition Using Dynamic Vision Sensor

An AI-based Approach for Improved Sign Language Recognition using Multiple Videos

Article 28 February 2022

Notes

Because we are interested in the analysis of a stream of temporal events, we will not consider static images.

References

DECOLLE implementetion code. https://github.com/nmi-lab/decolle-public. Accessed 09 July 2021
jaer open source project: Real time sensory-motor processing for event-based sensors and systems. http://www.jaerproject.org. Accessed: 09 July 2021
SL-Animals-DVS dataset. http://www2.imse-cnm.csic.es/neuromorphs/index.php/SL-ANIMALS-DVS-Database. Accessed: 09 July 2021
Amaral L, Júnior GL, Vieira T, Vieira T (2018) Evaluating deep models for dynamic Brazilian sign language recognition. In: Iberoamerican congress on pattern recognition, pp. 930–937. Springer. https://doi.org/10.1007/978-3-030-13469-3_107
Amir A, Taba B, Berg D, Melano T, McKinstry J, Di Nolfo C, Nayak T, Andreopoulos A, Garreau G, Mendoza M et al (2017) A low power, fully event-based gesture recognition system. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7243–7252 (2017). https://doi.org/10.1109/CVPR.2017.781
Baranwal N, Nandi GC (2017) An efficient gesture based humanoid learning using wavelet descriptor and mfcc techniques. Int J Mach Learn Cybern 8(4):1369–1388. https://doi.org/10.1007/s13042-016-0512-4
Article Google Scholar
Bellugi U, Klima E (2001) Sign language. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social and behavioral sciences, pp. 14066–14071. Pergamon, Oxford. https://doi.org/10.1016/B0-08-043076-7/02940-5
Camuñas-Mesa LA, Linares-Barranco B, Serrano-Gotarredona T (2019) Neuromorphic spiking neural networks and their memristor-cmos hardware implementations. Materials 12(17):2745. https://doi.org/10.3390/ma12172745
Article Google Scholar
Canales E (2021) iAPRENDE A SIGNAR! LSE (20 Animales). https://www.youtube.com/watch?v=IRue9cRhsDk. Accessed: 9 July
Caselli NK, Sehyr ZS, Cohen-Goldberg AM, Emmorey K (2017) ASL-LEX: a lexical database of American sign language. Behav Res Methods 49(2):784–801. https://doi.org/10.3758/s13428-016-0742-0
Article Google Scholar
Cerna LR, Cardenas EE, Miranda DG, Menotti D, Camara-Chavez G (2021) A multimodal libras-ufop Brazilian sign language dataset of minimal pairs using a microsoft kinect sensor. Exp Syst Appl 167:114179. https://doi.org/10.1016/j.eswa.2020.114179
Article Google Scholar
Chen G, Chen J, Lienen M, Conradt J, Röhrbein F, Knoll AC (2019) FLGR: fixed length GISTS representation learning for RNN-HMM hybrid-based neuromorphic continuous gesture recognition. Front Neurosci 13:73
Article Google Scholar
Cheok MJ, Omar Z, Jaward MH (2019) A review of hand gesture and sign language recognition techniques. Int J Mach Learn Cybern 10(1):131–153. https://doi.org/10.1007/s13042-017-0705-5
Article Google Scholar
Corina D (2001) Sign language: psychological and neural aspects. In: Smelser NJ, Baltes PB (eds) International encyclopedia of the social and behavioral sciences, pp. 14071–14075. Pergamon, Oxford. https://doi.org/10.1016/B0-08-043076-7/03492-6
Dreuw P, Neidle C, Athitsos V, Sclaroff S, Ney H (2008) Benchmark databases for video-based automatic sign language recognition. In: LREC
Emmorey K, Corina D (1990) Lexical recognition in sign language: effects of phonetic structure and morphology. Percept Motor Skills 71(3\_suppl), 1227–1252. https://doi.org/10.2466/pms.1990.71.3f.1227
Eryilmaz SB, Joshi S, Neftci E, Wan W, Cauwenberghs G, Wong HSP (2016) Neuromorphic architectures with electronic synapses. In: 17th international symposium on quality electronic design (ISQED), pp. 118–123. https://doi.org/10.1109/ISQED.2016.7479186
Escalera S, Baró X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce-López V, Escalante HJ, Shotton J, Guyon I (2014) Chalearn looking at people challenge 2014: dataset and results. In: European conference on computer vision, pp. 459–473. Springer. https://doi.org/10.1007/978-3-319-16178-5_32
Forster J, Schmidt C, Koller O, Bellgardt M, Ney H (2014) Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-Weather. In: International conference on language resources and evaluation, pp. 1911–1916
Gerstner W, Kistler WM (2002) Spiking neuron models: single neurons, populations, plasticity. Cambridge University Press, Cambridge
Book Google Scholar
Kaiser J, Mostafa H, Neftci E (2020) Synaptic plasticity dynamics for deep continuous local learning (decolle). Front Neurosci 14:424. https://doi.org/10.3389/fnins.2020.00424
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Li D, Chen X, Becchi M, Zong Z (2016) Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In: IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom), pp. 477–484. https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.76
Liang ZJ, Liao SB, Hu BZ (2018) 3D convolutional neural networks for dynamic sign language recognition. Comput J 61(11):1724–1736. https://doi.org/10.1093/comjnl/bxy049
Lichtsteiner P, Posch C, Delbruck T (2006) A 128*128 120db 15us latency asynchronous temporal contrast vision sensor. pp. 566–576. https://doi.org/10.1109/JSSC.2007.914337
Lungu IA, Corradi F, Delbrück T (2017) Live demonstration: convolutional neural network driven by dynamic vision sensor playing RoShamBo. In: IEEE international symposium on circuits and systems (ISCAS), p 1 . https://doi.org/10.1109/ISCAS.2017.8050403
Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw 10(9):1659–1671. https://doi.org/10.1016/S0893-6080(97)00011-7
Article Google Scholar
Maro JM, Ieng SH, Benosman R (2020) Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Front Neurosci 14:275
Article Google Scholar
Martínez AM, Wilbur RB, Shay R, Kak AC (2002) Purdue RVL-SLLL ASL database for automatic recognition of American sign language. In: IEEE international conference on multimodal interfaces, pp 167–172. https://doi.org/10.1109/ICMI.2002.1166987
McLeister M (2019) Worship, technology and identity: a deaf protestant congregation in urban China. Stud World Christ 25(2):220–237
Article Google Scholar
Merolla PA, Arthur JV, Alvarez-Icaza R, Cassidy AS, Sawada J, Akopyan F, Jackson BL, Imam N, Guo C, Nakamura Y et al (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197):668–673. https://doi.org/10.1126/science.1254642
Article Google Scholar
Mori Y, Toyonaga M (2018) Data-glove for Japanese sign language training system with gyro-sensor. In: Joint 10th international conference on soft computing and intelligent systems (SCIS) and 19th international symposium on advanced intelligent systems (ISIS), pp. 1354–1357. https://doi.org/10.1007/s13042-017-0705-5
Pérez-Carrasco JA, Zhao B, Serrano C, Acha B, Serrano-Gotarredona T, Chen S, Linares-Barranco B (2013) Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing-application to feedforward ConvNets. IEEE Trans Pattern Anal Mach Intell 35(11):2706–2719. https://doi.org/10.1109/TPAMI.2013.71
Article Google Scholar
Posch C, Matolin D, Wohlgenannt R (2010) A QVGA 143 db dynamic range frame-free PWM image sensor with lossless pixel-level video compression and time-domain CDS. IEEE J Solid-State Circuits 46(1):259–275. https://doi.org/10.1109/JSSC.2010.2085952
Article Google Scholar
Serrano-Gotarredona T, Linares-Barranco B (2013) A 128x128 1.5% contrast sensitivity 0.9% fpn 3us latency 4 mw asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers. IEEE J Solid State Circuits 48(3):827–838
Shrestha SB, Orchard G (2018) SLAYER: spike layer error reassignment in time. In: Advances in neural information processing systems, pp 1412–1421
Sivilotti MA (1991) Wiring considerations in analog vlsi systems, with application to field-programmable networks. Ph.D. thesis, Computation and Neural Systems, California Inst. Technol., Pasadena, CA, USA
Troelsgård T, Kristoffersen JH (2008) An electronic dictionary of Danish sign language. In: Theoretical issues in sign language research conference, Florianopolis, Brazil
Upadhyay NK, Jiang H, Wang Z, Asapu S, Xia Q, Joshua Yang J (2019) Emerging memory devices for neuromorphic computing. Adv Mater Technol 4(4):1800589. https://doi.org/10.1002/admt.201800589
Article Google Scholar
Vasudevan A, Negri P, Linares-Barranco B, Serrano-Gotarredona T (2020) Introduction and analysis of an event-based sign language dataset. In: Faces and gestures in E-health and welfare (FaGEW) workshop, 15th IEEE international conference on automatic face and gesture recognition (FG), pp 441–448
Von Agris U, Kraiss KF (2007) Towards a video corpus for signer-independent continuous sign language recognition
Wan J, Zhao Y, Zhou S, Guyon I, Escalera S, Li SZ (2016) Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 56–64 https://doi.org/10.1109/CVPRW.2016.100
Wang H, Chai X, Hong X, Zhao G, Chen X (2016) Isolated sign language recognition with Grassmann covariance matrices. ACM Trans Access Comput (TACCESS) 8(4):1–21. https://doi.org/10.1145/2897735
Article Google Scholar
Wang Q, Zhang Y, Yuan J, Lu Y (2019) Space-time event clouds for gesture recognition: from rgb cameras to event cameras. In: IEEE winter conference on applications of computer vision (WACV), pp 1826–1835. https://doi.org/10.1109/WACV.2019.00199
Wang X, Lin X, Dang X (2019) A delay learning algorithm based on spike train kernels for spiking neurons. Front Neurosci 13:252. https://doi.org/10.3389/fnins.2019.00252
Article Google Scholar
World Health Organization: Deafness and hearing loss (2019). https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss. Accessed 09 July 2021
Wu Y, Deng L, Li G, Zhu J, Shi L (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Front Neurosci 12:331. https://doi.org/10.3389/fnins.2018.00331
Article Google Scholar
Yousefzadeh A, Khoei MA, Hosseini S, Holanda P, Leroux S, Moreira O, Tapson J, Dhoedt B, Simoens P, Serrano-Gotarredona T et al (2019) Asynchronous spiking neurons, the natural key to exploit temporal sparsity. IEEE J Emerg Sel Top Circuits Syst 9(4):668–678. https://doi.org/10.1109/JETCAS.2019.2951121
Article Google Scholar
Yuan T, Sah S, Ananthanarayana T, Zhang C, Bhat A, Gandhi S, Ptucha R (2019) Large scale sign language interpretation. In: 14th IEEE international conference on automatic face and gesture recognition (FG), pp 1–5. https://doi.org/10.1109/FG.2019.8756506

Download references

Acknowledgements

This work was funded by EU H2020 grants 824164 “HERMES”, 871371 “Memscales” and 871501 “NeuroNN”, Spanish grant from the Ministry of Economy and Competitivity NANOMIND-PID2019-105556GB-C31, with support from the European Regional Development Fund. P. Negri was partially supported by the Scholarship Program Mobility of the Postgraduate Iberoamerican University Association (AUIP). C. Di Ielsi was supported by a scholarship of the Computer Department of the University of Buenos Aires. A. Vasudevan was supported by the MINECO FPI scholarship BES-2016-077757.

Author information

Authors and Affiliations

Instituto de Microelectronica de Sevilla (IMSE-CNM), CSIC and Universidad de Sevilla, Sevilla, Spain
Ajay Vasudevan, Bernabe Linares-Barranco & Teresa Serrano-Gotarredona
Computer Department, University of Buenos Aires (UBA-FCEN), Buenos Aires, Argentina
Pablo Negri & Camila Di Ielsi
Institute of Research in Computer Sciences (ICC), CONICET-UBA, Buenos Aires, Argentina
Pablo Negri

Authors

Ajay Vasudevan
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Negri
View author publications
You can also search for this author in PubMed Google Scholar
Camila Di Ielsi
View author publications
You can also search for this author in PubMed Google Scholar
Bernabe Linares-Barranco
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Serrano-Gotarredona
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo Negri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vasudevan, A., Negri, P., Di Ielsi, C. et al. SL-Animals-DVS: event-driven sign language animals dataset . Pattern Anal Applic 25, 505–520 (2022). https://doi.org/10.1007/s10044-021-01011-w

Download citation

Received: 28 August 2020
Accepted: 07 July 2021
Published: 16 July 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10044-021-01011-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SL-Animals-DVS: event-driven sign language animals dataset

Abstract

Access this article

Similar content being viewed by others

The Korean Sign Language Dataset for Action Recognition

Event-Based American Sign Language Recognition Using Dynamic Vision Sensor

An AI-based Approach for Improved Sign Language Recognition using Multiple Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SL-Animals-DVS: event-driven sign language animals dataset

Abstract

Access this article

Similar content being viewed by others

The Korean Sign Language Dataset for Action Recognition

Event-Based American Sign Language Recognition Using Dynamic Vision Sensor

An AI-based Approach for Improved Sign Language Recognition using Multiple Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation