EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals

Varugeese, Akshath; Shahina, A.; Nawas, Khadar; Khan, A. Nayeemulla

doi:10.1007/s11063-021-10546-2

EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals

Published: 10 November 2021

Volume 54, pages 21–41, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

460 Accesses
3 Citations
Explore all metrics

Abstract

Transient Evoked Otoacoustic Emissions (TEOAE) are a class of oto-acoustic emissions that are generated by the cochlea in response to an external stimulus. The TEOAE signals exhibit characteristics unique to an individual, and are therefore considered as a potential biometric modality. Unlike conventional modalities, TEOAE is immune to replay and falsification attacks due to its implicit liveliness detection feature. In this paper, we propose an efficient deep neural network architecture, EarNet, to learn the appropriate filters for non-stationary (TEOAE) signals, which can reveal individual uniqueness and long- term reproducibility. EarNet is inspired by Google’s FaceNet. Furthermore, the embeddings generated by EarNet, in the Euclidean space, are such that they reduce intra-subject variability while capturing inter-subject variability, as visualized using t-SNE. The embeddings from EarNet are used for identification and verification tasks. The K-Nearest Neighbour classifier gives identification accuracies of 99.21% and 99.42% for the left and right ear, respectively, which are highest among the machine learning algorithms explored in this work. The verification using Pearson correlation on the embeddings performs with an EER of 0.581% and 0.057% for the left and right ear, respectively, scoring better than all other techniques. Fusion strategy yields an improved identification accuracy of 99.92%. The embeddings generalize well on subjects that are not part of the training, and hence EarNet is scalable on any new larger dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EEG-Based User Identification Using Channel-Wise Features

Personal Identification Based on Content-Independent EEG Signal Analysis

The single-channel dry electrode SSVEP-based biometric approach: data augmentation techniques against overfitting for RNN-based deep models

Article 01 November 2022

References

Kemp D (1978) Acoustic resonances originating inside the cochlea. In: British society of audiology short papers meeting, pp 290–294
Martin Watkin PM (1996) Neonatal otoacoustic emission screening and the identification of deafness. Arch Dis Child Fetal Neonatal Ed 74:F16–F25
Article Google Scholar
Hall J (2000) Handbook of otoacoustic emissions (a singular audiology text). Singular Publ., Group, San Diego
Google Scholar
Zimatore G, Giuliani A, Hatzopoulos S, Martini A, Colosimo A (2002) Invariant and subject-dependent features of otoacoustic emissions. In: Proceedings of the 3rd international symposium on medical data analysis, pp 158–166
Hall JW, Baer JE, Chase PA, Schwaber MK (2009) Sex differences in distortion-product and transient-evoked otoacoustic emissions compared. J Acoust Soc Am 125:239–246
Article Google Scholar
Bilger RC, Matthies ML, Hammel DR, Demorest ME (1990) Genetic-implications of gender differences in the prevalence of spontaneous otoacoustic emissions. J Speech Lang Hear Res 33:418–432
Article Google Scholar
Whitehead ML, Kamal N, Lonsbury-Martin BL, Martin GK (1993) Spontaneous otoacoustic emissions in different racial groups. Scand Audiol 22:3–10
Article Google Scholar
Matsumoto T, Matsumoto H, Yamada K, Hoshino S (2002) Impact of artificial ‘gummy’ fingers on fingerprint systems. Proc SPIE 4677:275–289
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognit Neurosci 3:71–86
Article Google Scholar
Gao Y, Leung MKH (2002) Face recognition using line edge map. IEEE Trans Pattern Anal Mach Intell 24:764–779
Article Google Scholar
Wiskott L, Fellous J-M, Norbert N, von der Malsburg C (1997) Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19:775–779
Article Google Scholar
Florian S, Dmitry K, James P (2015) FaceNet: a unified embedding for face recognition and clustering. arXiv:1503.03832
Gold T, Hearing II (1948) The physiological basis of the action of the cochlea. Proc R Soc Edinb 135:492–490
Google Scholar
Swabey MA, Beeby SP, Brown AD, Chad JE (2004) Using otoacoustic emissions as a biometric. In: Proceedings of the international conference on biometric authentication (ICBA), pp 600–606
Grzanka A, Konopka W, Hatzopoulos S, Zalewski P (2001) TEOAE high resolution time-frequency components and their long term stability. In: Proceedings of the 17th biennial symposium international evoked response audiometry study group (IERASG), p 36
Konopka W, Grzanka A, Zalewski P (2002) Personal long-term reproducibility of the TEOAE time-frequency distributions. Polish J Otolaryngol 56:701–706
Google Scholar
Grabham NJ et al (2013) An evaluation of otoacoustic emissions as a biometric. IEEE Trans Inf Forensics Sec 8:174–183
Article Google Scholar
Prieve BA, Fitzgerald TS, Schulte LE, Kemp DT (1997) Basic characteristics of distortion product otoacoustic emissions in infants and children. J Acoust Soc Am 102:2871–2879
Article Google Scholar
Konrad-Martin D, Poling GL, Dreisbach LE, Reavis KM, McMillan GP, Miller JA, Lapsley M (2016) Serial monitoring of otoacoustic emissions in clinical trials. Otol Neurotol 37(8):e286–e294. https://doi.org/10.1097/MAO.0000000000001134
Article Google Scholar
Marlin J, Olofsson Å, Berninger E (2020) Twin study of neonatal transient-evoked otoacoustic emissions. In: Hearing research, volume 398. ISSN 108108:0378–5955
Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 949,753,0B1
Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 979,467,2B2
Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 1070,868,0B2
Nura Holdings Pty Ltd (2016) Headphones with combined ear-cup and ear-bud. US Patent 1016,534,5B2
NYMI Inc (2016) Preauthorized wearable biometric device, system and method for use thereof, US Patent 947,203,3B2
Swabey MA et al (2009) The biometric potential of transient otoacoustic emissions. Int J Biom 1:349–364
Google Scholar
Chambers P, Grabham NJ, Swabey MA (2011) A comparison of verification in the temporal and cepstrum-transformed domains of transient evoked otoacoustic emissions for biometric identification. Int J Biom 3:246–264
Google Scholar
Gao J, Agrafioti F, Wang S, Hatzinakos D (2012) Transient otoacoustic emissions for biometric recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2249–2252
Liu Y, Hatzinakos D (2014) Earprint: transient evoked otoacoustic emission for biometrics. IEEE Trans Inf Forensics Secur 9:2291–2300
Article Google Scholar
Tognola G, Grandori F, Ravazzani P (1998) Wavelet analysis of clickevoked otoacoustic emissions. IEEE Trans Biomed Eng 45:686–697
Article Google Scholar
Juang B-H, Katagiri S (1992) Discriminative learning for minimum error classification [pattern recognition]. IEEE Trans Signal Process 40:3043–3054
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Proceedings of the 18th international conference on neural information processing systems, NIPS’05, pp 1473–1480
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. CoRR arXiv:1703.07737
Eyben F, Wöllmer M, Schuller BW (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. ACM Multimed 1459–1462
Golik P, Tüske Z, Schlüter R, Ney H (2015) Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. INTERSPEECH
Hoshen Y, Weiss RJ, Wilson KW (2015) Speech acoustic modeling from raw multichannel waveforms. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4624–4628
Mitra V, Franco H (2015) Time-frequency convolutional networks for robust speech recognition. IEEE Worksh Autom Speech Recognit Understand (ASRU) 2015:317–323
Article Google Scholar
Li P, Qian J, Wang T (2015) Automatic instrument recognition in polyphonic music using convolutional neural networks. CoRR arXiv:1511.05520
Palaz D, Magimai-Doss M, Collobert R (2015) Analysis of CNN-based speech recognition system using raw speech as input. INTERSPEECH
Schlüter R, Bezrukov I, Wagner H, Ney H (2007) Gammatone features and feature combination for large vocabulary speech recognition. In: 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, 4, IV-649-IV-652
Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263
Article Google Scholar
Cheuk KW, Anderson H, Agres K, Herremans D (2020) nnAudio: an on-the-fly GPU audio to spectrogram conversion toolbox using 1D convolutional neural networks. IEEE Access 8:161981–162003. https://doi.org/10.1109/ACCESS.2020.3019084
Article Google Scholar
Chowdhury A, Ross A (2020) Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals. IEEE Trans Inf Forensics Secur 15:1616–1629. https://doi.org/10.1109/TIFS.2019.2941773
Article Google Scholar
Tüske Z, Golik P, Schlüter R, Ney H (2014) Acoustic modeling with deep neural networks using raw time signal for LVCSR. INTERSPEECH
Transient Otoacoustic Emission (TEOAE). Biometrics Security Lab. Univ. Toronto, “http://www.comm.utoronto.ca/~biometrics/databases”
Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget
Ghosal D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. Proc Interspeech 2018:2087–2091
Article Google Scholar
Qin C-X, Qu D, Zhang L-H (2018) Towards end-to-end speech recognition with transfer learning. EURASIP J Audio Speech Music Process 2018:1687–4722
Article Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the University of Toronto, Biometric security lab for providing the dataset, and also Dimitros Hatzinakos for an explanation of their work [28]. Furthermore, we would also like to thank Prashant Maheshwari, Ganesh Tata, and Sangeeth, colleagues of Akshath at Capillary for having fruitful discussions regarding audio processing with the authors.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, 600127, India
Akshath Varugeese, Khadar Nawas & A. Nayeemulla Khan
Department of Information Technology, Sri Sivasubramania Nadar College of Engineering, Chennai, 603110, India
A. Shahina

Authors

Akshath Varugeese
View author publications
You can also search for this author in PubMed Google Scholar
A. Shahina
View author publications
You can also search for this author in PubMed Google Scholar
Khadar Nawas
View author publications
You can also search for this author in PubMed Google Scholar
A. Nayeemulla Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akshath Varugeese.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varugeese, A., Shahina, A., Nawas, K. et al. EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals. Neural Process Lett 54, 21–41 (2022). https://doi.org/10.1007/s11063-021-10546-2

Download citation

Accepted: 25 May 2021
Published: 10 November 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11063-021-10546-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals

Abstract

Access this article

Similar content being viewed by others

EEG-Based User Identification Using Channel-Wise Features

Personal Identification Based on Content-Independent EEG Signal Analysis

The single-channel dry electrode SSVEP-based biometric approach: data augmentation techniques against overfitting for RNN-based deep models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals

Abstract

Access this article

Similar content being viewed by others

EEG-Based User Identification Using Channel-Wise Features

Personal Identification Based on Content-Independent EEG Signal Analysis

The single-channel dry electrode SSVEP-based biometric approach: data augmentation techniques against overfitting for RNN-based deep models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation