Skip to main content
Log in

EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Transient Evoked Otoacoustic Emissions (TEOAE) are a class of oto-acoustic emissions that are generated by the cochlea in response to an external stimulus. The TEOAE signals exhibit characteristics unique to an individual, and are therefore considered as a potential biometric modality. Unlike conventional modalities, TEOAE is immune to replay and falsification attacks due to its implicit liveliness detection feature. In this paper, we propose an efficient deep neural network architecture, EarNet, to learn the appropriate filters for non-stationary (TEOAE) signals, which can reveal individual uniqueness and long- term reproducibility. EarNet is inspired by Google’s FaceNet. Furthermore, the embeddings generated by EarNet, in the Euclidean space, are such that they reduce intra-subject variability while capturing inter-subject variability, as visualized using t-SNE. The embeddings from EarNet are used for identification and verification tasks. The K-Nearest Neighbour classifier gives identification accuracies of 99.21% and 99.42% for the left and right ear, respectively, which are highest among the machine learning algorithms explored in this work. The verification using Pearson correlation on the embeddings performs with an EER of 0.581% and 0.057% for the left and right ear, respectively, scoring better than all other techniques. Fusion strategy yields an improved identification accuracy of 99.92%. The embeddings generalize well on subjects that are not part of the training, and hence EarNet is scalable on any new larger dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Kemp D (1978) Acoustic resonances originating inside the cochlea. In: British society of audiology short papers meeting, pp 290–294

  2. Martin Watkin PM (1996) Neonatal otoacoustic emission screening and the identification of deafness. Arch Dis Child Fetal Neonatal Ed 74:F16–F25

    Article  Google Scholar 

  3. Hall J (2000) Handbook of otoacoustic emissions (a singular audiology text). Singular Publ., Group, San Diego

    Google Scholar 

  4. Zimatore G, Giuliani A, Hatzopoulos S, Martini A, Colosimo A (2002) Invariant and subject-dependent features of otoacoustic emissions. In: Proceedings of the 3rd international symposium on medical data analysis, pp 158–166

  5. Hall JW, Baer JE, Chase PA, Schwaber MK (2009) Sex differences in distortion-product and transient-evoked otoacoustic emissions compared. J Acoust Soc Am 125:239–246

    Article  Google Scholar 

  6. Bilger RC, Matthies ML, Hammel DR, Demorest ME (1990) Genetic-implications of gender differences in the prevalence of spontaneous otoacoustic emissions. J Speech Lang Hear Res 33:418–432

    Article  Google Scholar 

  7. Whitehead ML, Kamal N, Lonsbury-Martin BL, Martin GK (1993) Spontaneous otoacoustic emissions in different racial groups. Scand Audiol 22:3–10

    Article  Google Scholar 

  8. Matsumoto T, Matsumoto H, Yamada K, Hoshino S (2002) Impact of artificial ‘gummy’ fingers on fingerprint systems. Proc SPIE 4677:275–289

  9. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognit Neurosci 3:71–86

    Article  Google Scholar 

  10. Gao Y, Leung MKH (2002) Face recognition using line edge map. IEEE Trans Pattern Anal Mach Intell 24:764–779

    Article  Google Scholar 

  11. Wiskott L, Fellous J-M, Norbert N, von der Malsburg C (1997) Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19:775–779

    Article  Google Scholar 

  12. Florian S, Dmitry K, James P (2015) FaceNet: a unified embedding for face recognition and clustering. arXiv:1503.03832

  13. Gold T, Hearing II (1948) The physiological basis of the action of the cochlea. Proc R Soc Edinb 135:492–490

    Google Scholar 

  14. Swabey MA, Beeby SP, Brown AD, Chad JE (2004) Using otoacoustic emissions as a biometric. In: Proceedings of the international conference on biometric authentication (ICBA), pp 600–606

  15. Grzanka A, Konopka W, Hatzopoulos S, Zalewski P (2001) TEOAE high resolution time-frequency components and their long term stability. In: Proceedings of the 17th biennial symposium international evoked response audiometry study group (IERASG), p 36

  16. Konopka W, Grzanka A, Zalewski P (2002) Personal long-term reproducibility of the TEOAE time-frequency distributions. Polish J Otolaryngol 56:701–706

    Google Scholar 

  17. Grabham NJ et al (2013) An evaluation of otoacoustic emissions as a biometric. IEEE Trans Inf Forensics Sec 8:174–183

    Article  Google Scholar 

  18. Prieve BA, Fitzgerald TS, Schulte LE, Kemp DT (1997) Basic characteristics of distortion product otoacoustic emissions in infants and children. J Acoust Soc Am 102:2871–2879

    Article  Google Scholar 

  19. Konrad-Martin D, Poling GL, Dreisbach LE, Reavis KM, McMillan GP, Miller JA, Lapsley M (2016) Serial monitoring of otoacoustic emissions in clinical trials. Otol Neurotol 37(8):e286–e294. https://doi.org/10.1097/MAO.0000000000001134

    Article  Google Scholar 

  20. Marlin J, Olofsson Å, Berninger E (2020) Twin study of neonatal transient-evoked otoacoustic emissions. In: Hearing research, volume 398. ISSN 108108:0378–5955

  21. Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 949,753,0B1

  22. Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 979,467,2B2

  23. Nura Holdings Pty Ltd (2016) Personalization of auditory stimulus. US Patent 1070,868,0B2

  24. Nura Holdings Pty Ltd (2016) Headphones with combined ear-cup and ear-bud. US Patent 1016,534,5B2

  25. NYMI Inc (2016) Preauthorized wearable biometric device, system and method for use thereof, US Patent 947,203,3B2

  26. Swabey MA et al (2009) The biometric potential of transient otoacoustic emissions. Int J Biom 1:349–364

    Google Scholar 

  27. Chambers P, Grabham NJ, Swabey MA (2011) A comparison of verification in the temporal and cepstrum-transformed domains of transient evoked otoacoustic emissions for biometric identification. Int J Biom 3:246–264

    Google Scholar 

  28. Gao J, Agrafioti F, Wang S, Hatzinakos D (2012) Transient otoacoustic emissions for biometric recognition. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2249–2252

  29. Liu Y, Hatzinakos D (2014) Earprint: transient evoked otoacoustic emission for biometrics. IEEE Trans Inf Forensics Secur 9:2291–2300

    Article  Google Scholar 

  30. Tognola G, Grandori F, Ravazzani P (1998) Wavelet analysis of clickevoked otoacoustic emissions. IEEE Trans Biomed Eng 45:686–697

    Article  Google Scholar 

  31. Juang B-H, Katagiri S (1992) Discriminative learning for minimum error classification [pattern recognition]. IEEE Trans Signal Process 40:3043–3054

    Article  Google Scholar 

  32. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  33. Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Proceedings of the 18th international conference on neural information processing systems, NIPS’05, pp 1473–1480

  34. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. CoRR arXiv:1703.07737

  35. Eyben F, Wöllmer M, Schuller BW (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. ACM Multimed 1459–1462

  36. Golik P, Tüske Z, Schlüter R, Ney H (2015) Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. INTERSPEECH

  37. Hoshen Y, Weiss RJ, Wilson KW (2015) Speech acoustic modeling from raw multichannel waveforms. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4624–4628

  38. Mitra V, Franco H (2015) Time-frequency convolutional networks for robust speech recognition. IEEE Worksh Autom Speech Recognit Understand (ASRU) 2015:317–323

    Article  Google Scholar 

  39. Li P, Qian J, Wang T (2015) Automatic instrument recognition in polyphonic music using convolutional neural networks. CoRR arXiv:1511.05520

  40. Palaz D, Magimai-Doss M, Collobert R (2015) Analysis of CNN-based speech recognition system using raw speech as input. INTERSPEECH

  41. Schlüter R, Bezrukov I, Wagner H, Ney H (2007) Gammatone features and feature combination for large vocabulary speech recognition. In: 2007 IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, 4, IV-649-IV-652

  42. Abdoli S, Cardinal P, Koerich AL (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl 136:252–263

    Article  Google Scholar 

  43. Cheuk KW, Anderson H, Agres K, Herremans D (2020) nnAudio: an on-the-fly GPU audio to spectrogram conversion toolbox using 1D convolutional neural networks. IEEE Access 8:161981–162003. https://doi.org/10.1109/ACCESS.2020.3019084

    Article  Google Scholar 

  44. Chowdhury A, Ross A (2020) Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals. IEEE Trans Inf Forensics Secur 15:1616–1629. https://doi.org/10.1109/TIFS.2019.2941773

    Article  Google Scholar 

  45. Tüske Z, Golik P, Schlüter R, Ney H (2014) Acoustic modeling with deep neural networks using raw time signal for LVCSR. INTERSPEECH

  46. Transient Otoacoustic Emission (TEOAE). Biometrics Security Lab. Univ. Toronto, “http://www.comm.utoronto.ca/~biometrics/databases

  47. Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget

  48. Ghosal D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. Proc Interspeech 2018:2087–2091

    Article  Google Scholar 

  49. Qin C-X, Qu D, Zhang L-H (2018) Towards end-to-end speech recognition with transfer learning. EURASIP J Audio Speech Music Process 2018:1687–4722

    Article  Google Scholar 

  50. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the University of Toronto, Biometric security lab for providing the dataset, and also Dimitros Hatzinakos for an explanation of their work [28]. Furthermore, we would also like to thank Prashant Maheshwari, Ganesh Tata, and Sangeeth, colleagues of Akshath at Capillary for having fruitful discussions regarding audio processing with the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akshath Varugeese.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Varugeese, A., Shahina, A., Nawas, K. et al. EarNet: Biometric Embeddings for End to End Person Authentication System Using Transient Evoked Otoacoustic Emission Signals. Neural Process Lett 54, 21–41 (2022). https://doi.org/10.1007/s11063-021-10546-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10546-2

Keywords

Navigation