skip to main content
research-article

Voice In Ear: Spoofing-Resistant and Passphrase-Independent Body Sound Authentication

Published: 30 March 2021 Publication History

Abstract

With the rapid growth of wearable computing and increasing demand for mobile authentication scenarios, voiceprint-based authentication has become one of the prevalent technologies and has already presented tremendous potentials to the public. However, it is vulnerable to voice spoofing attacks (e.g., replay attacks and synthetic voice attacks). To address this threat, we propose a new biometric authentication approach, named EarPrint, which aims to extend voiceprint and build a hidden and secure user authentication scheme on earphones. EarPrint builds on the speaking-induced body sound transmission from the throat to the ear canal, i.e., different users will have different body sound conduction patterns on both sides of ears. As the first exploratory study, extensive experiments on 23 subjects show the EarPrint is robust against ambient noises and body motions. EarPrint achieves an Equal Error Rate (EER) of 3.64% with 75 seconds enrollment data. We also evaluate the resilience of EarPrint against replay attacks. A major contribution of EarPrint is that it leverages two-level uniqueness, including the body sound conduction from the throat to the ear canal and the body asymmetry between the left and the right ears, taking advantage of earphones' paring form-factor. Compared with other mobile and wearable biometric modalities, EarPrint is a low-cost, accurate, and secure authentication solution for earphone users.

References

[1]
Muhammad Ejaz Ahmed, Il-Youp Kwak, Jun Ho Hua, Iljoo Kim, Taekkyung Oh, and Hyoungshick Kim. 2020. Void: A fast and light voice liveness detection system. In 29th USENIX Security Symposium (USENIX Security'20). 2685--2702.
[2]
Takashi Amesaka, Hiroki Watanabe, and Masanori Sugimoto. 2019. Facial Expression Recognition Using Ear Canal Transfer Function. In Proceedings of the 23rd International Symposium on Wearable Computers (ISWC '19). 1--9. https://doi.org/10.1145/3341163.3347747
[3]
Apple. 2020. About Face ID advanced technology. https://support.apple.com/en-us/HT208108. [Online; accessed 30-July-2020].
[4]
Abdelkareem Bedri, David Byrd, Peter Presti, Himanshu Sahni, Zehua Gue, and Thad Starner. 2015. Stick it in your ear: Building an in-ear jaw movement sensor. In Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers. 1333--1338.
[5]
Shengjie Bi, Tao Wang, Nicole Tobias, Josephine Nordrum, Shang Wang, George Halvorsen, Sougata Sen, Ronald Peterson, Kofi Odame, Kelly Caine, Ryan J. Halter, Jacob Sorber, and David Kotz. 2018. Auracle: Detecting Eating Episodes with an Ear-mounted Sensor. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 92.
[6]
David Braue. 2020. Voiceprint authentication starts to go mainstream in Australia. https://www.csoonline.com/article/3546188/voiceprint-authentication-starts-to-go-mainstream-in-australia.html. [Online; accessed 10-Feb-2020].
[7]
Nam Bui, Nhat Pham, Jessica Jacqueline Barnitz, Zhanan Zou, Phuc Nguyen, Hoang Truong, Taeho Kim, Nicholas Farrow, Anh Nguyen, Jianliang Xiao, et al. 2019. eBP: A Wearable System For Frequent and Comfortable Blood Pressure Monitoring From User's Ear. In The 25th Annual International Conference on Mobile Computing and Networking. 1--17.
[8]
S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang, J. Weng, L. Su, and A. Mohaisen. 2017. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 183--195.
[9]
Computerworld. 2020. Google Smart Lock: The complete guide. https://www.computerworld.com/article/3322626/google-smart-lock-complete-guide.html. [Online; accessed 30-June-2020].
[10]
Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In International conference on machine learning. 933--941.
[11]
P. L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, and I. Saratxaga. 2012. Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech. IEEE Transactions on Audio, Speech, and Language Processing 20, 8 (2012), 2280--2290.
[12]
Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2010. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2010), 788--798.
[13]
Yang Gao, Wei Wang, Vir V Phoha, Wei Sun, and Zhanpeng Jin. 2019. EarEcho: Using Ear Canal Echo for Wearable Authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--24.
[14]
Afzal Godil, Patrick Grother, and Sandy Ressler. 2003. Human identification from body shape. In Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings. IEEE, 386--392.
[15]
Google. 2016. WebRTC. https://webrtc.org/. [Online; accessed 30-June-2020].
[16]
Valentin Goverdovsky, David Looney, Preben Kidmose, and Danilo P Mandic. 2016. In-ear EEG from viscoelastic generic earpieces: Robust and unobtrusive 24/7 monitoring. IEEE Sensors Journal 16, 1 (2016), 271--277.
[17]
Valentin Goverdovsky, Wilhelm von Rosenberg, Takashi Nakamura, David Looney, David J Sharp, Christos Papavassiliou, Mary J Morrell, and Danilo P Mandic. 2017. Hearables: Multimodal physiological in-ear sensing. Scientific Reports 7, 1 (2017), 6948.
[18]
Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication 52, 4 (2010), 301--313. Silent Speech Interfaces.
[19]
Chenyu Huang, Huangxun Chen, Lin Yang, and Qian Zhang. 2018. BreathLive: Liveness detection for heart sound authentication with deep breathing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--25.
[20]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.
[21]
RG Maduranga M Jayamaha, Maduri RR Senadheera, T Nuwan C Gamage, KD Pavithra B Weerasekara, Gayan A Dissanayaka, and G Nuwan Kodagoda. 2008. VoizLock - human voice authentication system using hidden markov model. In 2008 4th International Conference on Information and Automation for Sustainability. IEEE, 330--335.
[22]
Takeshi Joyashiki and Chikamune Wada. 2020. Validation of a Body-Conducted Sound Sensor for Respiratory Sound Monitoring and a Comparison with Several Sensors. Sensors 20, 3 (2020), 1--16.
[23]
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, and Nobukatsu Hojo. 2019. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
[24]
Patrick Kenny, Gilles Boulianne, Pierre Ouellet, and Pierre Dumouchel. 2007. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing 15, 4 (2007), 1435--1447.
[25]
Leonid Kompanets. 2004. Biometrics of asymmetrical face. In International Conference on Biometric Authentication. Springer, 67--73.
[26]
Kazuhiro Kondo, Tomoe Fujita, and Kiyoshi Nakagawa. 2006. On equalization of bone conducted speech for improved speech quality. In 2006 IEEE International Symposium on Signal Processing and Information Technology. IEEE, 426--431.
[27]
Rui Liu, Cory Cornelius, Reza Rawassizadeh, Ronald Peterson, and David Kotz. 2018. Vocal Resonance: Using internal body voice for wearable authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--23.
[28]
Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, and Zhenhua Ling. 2018. The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods. arXiv preprint 1804.04262 (2018).
[29]
Li Lu, Jiadi Yu, Yingying Chen, and Yan Wang. 2020. VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1--24.
[30]
MAONO. 2018. What Is the Stethoscope Effect? http://m.maonotech.com/info/what-is-the-stethoscope-effect-29859990.html. [Online; accessed 10-July-2020].
[31]
Pavel Matëjka, Ondřej Glembek, Fabio Castaldo, Md Jahangir Alam, Oldřich Plchot, Patrick Kenny, Lukáš Burget, and Jan Černocky. 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4828--4831.
[32]
Deirdre D. Michael. 2018. About the voice. http://www.lionsvoiceclinic.umn.edu/page2.htm#physiology101. [Online; accessed 19-Jan-2020].
[33]
Mark Mirtchouk, Christopher Merck, and Samantha Kleinberg. 2016. Automated estimation of food type and amount consumed from body-worn audio and motion sensors. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 451--462.
[34]
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).
[35]
Kunihiko Nojima, Taishi Yokose, Takenobu Ishii, Makoto Kobayashi, and Yasushi Nishii. 2007. Tooth axis and skeletal structures in mandibular molar vertical sections in jaw deformity with facial asymmetry using MPR images. The Bulletin of Tokyo Dental College 48, 4 (2007), 171--176.
[36]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
[37]
Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin Dogus Cubuk, and Quoc V Le. 2019. SpecAugment: A Simple Augmentation Method for Automatic Speech Recognition. (2019).
[38]
Jang-Ho Park, Dae-Geun Jang, Jung Park, and Se-Kyoung Youm. 2015. Wearable sensing of in-ear pressure for heart rate monitoring with a piezoelectric sensor. Sensors 15, 9 (2015), 23402--23417.
[39]
Nhat Pham, Taeho Kim, Frederick M Thayer, Anh Nguyen, and Tam Vu. 2019. Earable-An Ear-Worn Biosignal Sensing Platform for Cognitive State Monitoring and Human-Computer Interaction. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services. 685--686.
[40]
Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating replay attacks against voice assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--26.
[41]
AirPods Pro. 2020. Apple. https://www.apple.com/airpods-pro/. [Online; accessed 30-July-2020].
[42]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:cs.LG/1511.06434
[43]
Grand View Research. 2020. Earphones Headphones Market Size Worth 126.7 Billion By 2027. https://www.grandviewresearch.com/press-release/global-earphones-headphones-market. [Online; accessed 30-July-2020].
[44]
Sheldon M Retchin and Martin Lenhardt. 2007. Recreational bone conduction audio device, system. US Patent 7,310,427.
[45]
Douglas A Reynolds, Thomas F Quatieri, and Robert B Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital signal processing 10, 1-3 (2000), 19--41.
[46]
Md. Sahidullah, Rosa Gonzalez Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan, Ville Hautamäki, Robert Parts, and Martti Pitkänen. 2016. Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech. In INTERSPEECH 2016. 1720--1724. https://doi.org/10.21437/Interspeech.2016-1153
[47]
A. Shahina and B. Yegnanarayana. 2007. Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach. EURASIP Journal on Advances in Signal Processing 087219 (2007).
[48]
J. Shang, S. Chen, and J. Wu. 2018. Defending Against Voice Spoofing: A Robust Software-Based Liveness Detection System. In 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). 28--36.
[49]
Maliheh Shirvanian, Summer Vo, and Nitesh Saxena. 2019. Quantifying the Breakability of Voice Assistants. In 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 1--11.
[50]
Masaki Shuzo, Shintaro Komori, Tomoko Takashima, Guillaume Lopez, Seiji Tatsuta, Shintaro Yanagimoto, Shin'ichi Warisawa, Jean-Jacques Delaunay, and Ichiro Yamada. 2010. Wearable eating habit sensing system using internal body sound. Journal of Advanced Mechanical Design, Systems, and Manufacturing 4, 1 (2010), 158--166.
[51]
Basic English Speaking. 2020. ESL Conversation. https://basicenglishspeaking.com/. [Online; accessed 30-June-2020].
[52]
Jiayao Tan, Xiaoliang Wang, Cam-Tu Nguyen, and Yu Shi. 2018. SilentKey: A new authentication framework through ultrasonic-based lip reading. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 36.
[53]
Satoru Tsuge, Takashi Osanai, Hisanori Makinae, Toshiaki Kamada, Minoru Fukumi, and Shingo Kuroiwa. 2008. Combination method of bone-conduction speech and air-conduction speech for speaker recognition. In Ninth Annual Conference of the International Speech Communication Association.
[54]
Boudewijn Venema, Johannes Schiefer, Vladimir Blazek, Nikolai Blanik, and Steffen Leonhardt. 2013. Evaluating innovative in-ear pulse oximetry for unobtrusive cardiovascular and pulmonary monitoring during sleep. IEEE Journal of Translational Engineering in Health and Medicine 1 (2013), 2700208--2700208.
[55]
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, et al. 2017. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135 (2017).
[56]
Zhi-Feng Wang, Gang Wei, and Qian-Hua He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In 2011 International conference on machine learning and cybernetics, Vol. 4. IEEE, 1708--1713.
[57]
WeChat. 2015. Voiceprint: The New WeChat Password. https://blog.wechat.com/tag/voiceprint/. [Online; accessed 30-June-2020].
[58]
Kang Weixin, Gong Xue, Wang Hongru, and Pan Dawei. 2017. Frequency characteristic of ultrasonic based on soft tissue attenuation model. In 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI). IEEE, 441--446.
[59]
Peter Welch. 1967. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics 15, 2 (1967), 70--73.
[60]
Shuqiong Wu and Hiroshi Nagahashi. 2015. Penalized AdaBoost: Improving the Generalization Error of Gentle AdaBoost through a Margin Distribution. IEICE Transactions on Information and Systems E98-D, 11 (2015), 1906--1915.
[61]
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint Based Spoofing Detection for Text-Independent Speaker Verification. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 1215--1229.
[62]
Bayya Yegnanarayana, A. Shahina, and M.R. Kesheorey. 2004. Throat microphone signal for speaker recognition. In INTERSPEECH-2004, 8th International Conference on Spoken Language Processing (ICSLP).
[63]
Lun Zhang, Rufeng Chu, Shiming Xiang, Shengcai Liao, and Stan Z Li. 2007. Face detection based on multi-block lbp representation. In International Conference on Biometrics. Springer, 11--18.
[64]
Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 57--71.
[65]
Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 1080--1091.
[66]
Zhengyou Zhang, Zicheng Liu, M. Sinclair, A. Acero, Li Deng, J. Droppo, Xuedong Huang, and Yanli Zheng. 2004. Multi-sensory microphones for robust speech detection, enhancement and recognition. In 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3. iii-781.

Cited By

View all
  • (2024)HCR-Auth: Reliable Bone Conduction Earphone Authentication with Head Contact ResponseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997808:4(1-27)Online publication date: 21-Nov-2024
  • (2024)PiezoBud: A Piezo-Aided Secure Earbud with Practical Speaker AuthenticationProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699358(564-577)Online publication date: 4-Nov-2024
  • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
  • Show More Cited By

Index Terms

  1. Voice In Ear: Spoofing-Resistant and Passphrase-Independent Body Sound Authentication

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 5, Issue 1
      March 2021
      1272 pages
      EISSN:2474-9567
      DOI:10.1145/3459088
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 March 2021
      Published in IMWUT Volume 5, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Voiceprint
      2. authentication
      3. earphones

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)170
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)HCR-Auth: Reliable Bone Conduction Earphone Authentication with Head Contact ResponseProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997808:4(1-27)Online publication date: 21-Nov-2024
      • (2024)PiezoBud: A Piezo-Aided Secure Earbud with Practical Speaker AuthenticationProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699358(564-577)Online publication date: 4-Nov-2024
      • (2024)WiFi-CSI Difference ParadigmProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36596088:2(1-29)Online publication date: 15-May-2024
      • (2024)EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable DeviceProceedings of the Augmented Humans International Conference 202410.1145/3652920.3653059(119-130)Online publication date: 4-Apr-2024
      • (2024)F2Key: Dynamically Converting Your Face into a Private Key Based on COTS Headphones for Reliable Voice InteractionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661860(127-140)Online publication date: 3-Jun-2024
      • (2024)MetaFormerProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435508:1(1-27)Online publication date: 6-Mar-2024
      • (2024)UFaceProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435468:1(1-27)Online publication date: 6-Mar-2024
      • (2024)Multi-Subject 3D Human Mesh Construction Using Commodity WiFiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435048:1(1-25)Online publication date: 6-Mar-2024
      • (2024)Privacy-Preserving and Cross-Domain Human Sensing by Federated Domain Adaptation with Semantic Knowledge CorrectionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435038:1(1-26)Online publication date: 6-Mar-2024
      • (2024)EarSEProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314477:4(1-33)Online publication date: 12-Jan-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media