skip to main content
research-article

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

Authors Info & Claims
Published:15 June 2020Publication History
Skip Abstract Section

Abstract

Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.

References

  1. Jont B Allen and Lawrence R Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65, 11 (1977), 1558--1564.Google ScholarGoogle ScholarCross RefCross Ref
  2. Amazon. 2019. Echo & Alexa - Amazon Device. [Online]. Available: https://www.amazon.com. (2019).Google ScholarGoogle Scholar
  3. Apple. 2019. iPhone XS - FaceID - Apple. [Online]. Available: https://www.apple.com/iphone-xs/face-id/. (2019).Google ScholarGoogle Scholar
  4. L. Benedikt, D. Cosker, P. L. Rosin, and D. Marshall. 2010. Assessing the Uniqueness and Permanence of Facial Actions for Use in Biometric Applications. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 3 (2010), 449--460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https://www.wordfrequency.info. (2020).Google ScholarGoogle Scholar
  6. J. P. Campbell. 1997. Speaker recognition: a tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.Google ScholarGoogle ScholarCross RefCross Ref
  7. Aaron Carroll and Gernot Heiser. 2010. An Analysis of Power Consumption in a Smartphone. In Proc. USENIX ATC. Boston, MA, USA, 21:1--21:14.Google ScholarGoogle Scholar
  8. Mingshi Chen, Panlong Yang, Jie Xiong, Maotian Zhang, Youngki Lee, Chaocan Xiang, and Chang Tian. 2019. Your Table Can Be an Input Panel: Acoustic-based Device-Free Interaction Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1 (2019), 3:1--3:21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang, J. Weng, L. Su, and A. Mohaisen. 2017. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. In Proc. IEEE ICDCS. 183--195.Google ScholarGoogle Scholar
  10. Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2011), 788--798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  12. G. R. Doddington. 1985. Speaker recognition---Identifying people by their voices. Proc. IEEE 73, 11 (1985), 1651--1664.Google ScholarGoogle ScholarCross RefCross Ref
  13. J.-L. Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2, 2 (1994), 291--298.Google ScholarGoogle ScholarCross RefCross Ref
  14. H. Gish and M. Schmidt. 1994. Text-independent speaker identification. IEEE Signal Processing Magazine 11, 4 (Oct 1994), 18--32.Google ScholarGoogle ScholarCross RefCross Ref
  15. Xavier Glorot, Antoine Bordes, Yoshua Bengio, Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2012. Deep Sparse Rectifier Neural Networks. In Proc. AISTATS'12. La Palma, Canary Islands, 315--323.Google ScholarGoogle Scholar
  16. Google. 2019. Google Home - Smart Speaker & Home Assistant. [Online]. Available: https://store.google.com/us/product/google_home. (2019).Google ScholarGoogle Scholar
  17. Google. 2019. Google Smart Lock. [Online]. Available: https://get.google.com/smartlock/. (2019).Google ScholarGoogle Scholar
  18. Diego Gragnaniello, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. 2015. Local contrast phase descriptor for fingerprint liveness detection. Pattern Recognition 48, 4 (2015), 1050--1058.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google ScholarGoogle Scholar
  20. Cesar Iovescu and Sandeep Rao. 2017. The fundamentals of millimeter wave sensors. Technical Report. Texas Instruments. http://www.ti.com/lit/wp/spyy005/spyy005.pdfGoogle ScholarGoogle Scholar
  21. Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030--3044.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mark Keith, Benjamin Shao, and Paul John Steinbart. 2007. The usability of passphrases for authentication: An empirical field study. International journal of human-computer studies 65, 1 (2007), 17--28.Google ScholarGoogle Scholar
  23. HJ Landau. 1967. Sampling, data transmission, and the Nyquist rate. Proc. IEEE 55, 10 (1967), 1701--1706.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proc. IEEE ICASSP. Florence, Italy, 1695--1699.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. 2016. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proc. ACM CCS. Vienna, Austria, 1068--1079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones. IEEE/ACM Transactions on Networking 27, 1 (2019), 447--460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Yunfei Liu, and Minglu Li. 2018. LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1466--1474.Google ScholarGoogle ScholarCross RefCross Ref
  28. Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Minglu Li, and Xiangyu Xu. 2019. I3: Sensing Scrolling Human-Computer Interactions for Intelligent Interest Inference on Smartphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 97:1--97:22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Xiangyu Xu, Guangtao Xue, and Minglu Li. 2019. KeyListener: Inferring Keystrokes on QWERTY Keyboard of Touch Screen through Acoustic Signals. In Proc. IEEE INFOCOM. Paris, France, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  30. Wenguang Mao, Jian He, and Lili Qiu. 2016. CAT: high-precision acoustic motion tracking. In Proc. ACM MobiCom. New York City, NY, USA, 69--81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wenguang Mao, Mei Wang, and Lili Qiu. 2018. AIM: Acoustic Imaging on a Mobile. In Proc. ACM MobiSys. Munich, Germany, 468--481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Pavel Matějka, Ondřej Glembek, Fabio Castaldo, Md Jahangir Alam, Oldřich Plchot, Patrick Kenny, Lukáš Burget, and Jan Černocky. 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In Proc. IEEE ICASSP. Prague, Czech Republic, 4828--4831.Google ScholarGoogle ScholarCross RefCross Ref
  33. Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In Proc. ESORICS. Springer, Vienna, Austria, 599--621.Google ScholarGoogle Scholar
  34. A. Nagrani, J. S. Chung, and A. Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. In Proc. ISCA INTERSPEECH. Stockholm, Sweden, 2616--2620.Google ScholarGoogle Scholar
  35. Swadhin Pradhan, Ghufran Baig, Wenguang Mao, Lili Qiu, Guohai Chen, and Bo Yang. 2018. Smartphone-based Acoustic Indoor Space Mapping. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 75 (2018), 26 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating Replay Attacks Against Voice Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 100:1--100:26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Qian, C. Wu, F. Xiao, Y. Zheng, Y. Zhang, Z. Yang, and Y. Liu. 2018. Acousticcardiogram: Monitoring Heartbeats using Acoustic Signals on Smart Devices. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1574--1582.Google ScholarGoogle Scholar
  38. Douglas A. Reynolds. 1997. Comparison of Background Normalization Methods for Text-Independent Speaker Verification. In Proc. ISCA EUROSPEECH. Rhodes, Greece, 963--966.Google ScholarGoogle Scholar
  39. Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 1 (2000), 19--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Samsung. 2017. Iris recognition on Galaxy S8. [Online]. Available: https://www.samsung.com/au/iris/. (2017).Google ScholarGoogle Scholar
  41. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proc. IEEE CVPR. Boston, MA, USA, 815--823.Google ScholarGoogle ScholarCross RefCross Ref
  42. Wei Shang and Maryhelen Stevenson. 2010. Score normalization in playback attack detection. In Proc. IEEE ICASSP. Dallas, Texas, USA, 1678--1681.Google ScholarGoogle ScholarCross RefCross Ref
  43. Sigurdur Sigurdsson, Kaare Brandt Petersen, and Tue Lehn-Schiøler. 2006. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proc. ISMIR. Victoria, Canada, 286--289.Google ScholarGoogle Scholar
  44. Merrill Ivan Skolnik. 1970. Radar handbook. McGraw-Hill, Incorporated, New York, NY, USA.Google ScholarGoogle Scholar
  45. Jiayao Tan, Cam-Tu Nguyen, and Xiaoliang Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In Proceedings of IEEE INFOCOM. IEEE, Atlanta, GA, USA, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  46. Jiayao Tan, Xiaoliang Wang, Cam-Tu Nguyen, and Yu Shi. 2018. SilentKey: A New Authentication Framework Through Ultrasonic-based Lip Reading. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1 (2018), 36:1--36:18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Emanuel von Zezschwitz, Paul Dunphy, and Alexander De Luca. 2013. Patterns in the Wild: A Field Study of the Usability of Pattern and Pin-based Authentication on Mobile Devices. In Proc. ACM MobileHCI. Munich, Germany, 261--270.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW Based Contactless Respiration Detection Using Acoustic Signal. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4 (2018), 170:1--170:20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zhi-Feng Wang, Gang Wei, and Qian-Hua He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proc. IEEE ICMLC. Guilin, China, 1708--1713.Google ScholarGoogle ScholarCross RefCross Ref
  50. Wechat. 2015. Voiceprint: The New Wechat Password. [Online]. Available: https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password/. (2015).Google ScholarGoogle Scholar
  51. Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66 (2015), 130--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xiangyu Xu, Hang Gao, Jiadi Yu, Yingying Chen, Yanmin Zhu, Guangtao Xue, and Minglu Li. 2017. ER: Early recognition of inattentive driving leveraging audio devices on smartphones. In Proc. IEEE INFOCOM. Atlanta, GA, USA, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  53. Xiangyu Xu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. In Proc. ACM MobiSys. Seoul, South Korea, 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint Based Spoofing Detection for Text-Independent Speaker Verification. In Proc. ACM CCS. London, United Kingdom, 1215--1229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Yan, A. Blackwell, R. Anderson, and A. Grant. 2004. Password memorability and security: empirical results. IEEE Security Privacy 2, 5 (2004), 25--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proc. ACM MobiSys. Niagara Falls, NY, USA, 15--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Matthew D Zeiler, Graham W Taylor, Rob Fergus, et al. 2011. Adaptive deconvolutional networks for mid and high level feature learning. In Proc. IEEE ICCV. Barcelona, Spain, 2018--2025.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. In Proc. ACM CCS. Dallas, TX, USA, 57--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proc. ACM CCS. Vienna, Austria, 1080--1091.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao, Zhibo Wang, and Xiaofeng Chen. 2018. PatternListener: Cracking Android Pattern Lock Using Acoustic Signals. In Proc. ACM CCS. Toronto, Canada, 1775--1787.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
        Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 4, Issue 2
        June 2020
        771 pages
        EISSN:2474-9567
        DOI:10.1145/3406789
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2020
        Published in imwut Volume 4, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader