Abstract
Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.
- Jont B Allen and Lawrence R Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65, 11 (1977), 1558--1564.Google ScholarCross Ref
- Amazon. 2019. Echo & Alexa - Amazon Device. [Online]. Available: https://www.amazon.com. (2019).Google Scholar
- Apple. 2019. iPhone XS - FaceID - Apple. [Online]. Available: https://www.apple.com/iphone-xs/face-id/. (2019).Google Scholar
- L. Benedikt, D. Cosker, P. L. Rosin, and D. Marshall. 2010. Assessing the Uniqueness and Permanence of Facial Actions for Use in Biometric Applications. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 3 (2010), 449--460.Google ScholarDigital Library
- C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https://www.wordfrequency.info. (2020).Google Scholar
- J. P. Campbell. 1997. Speaker recognition: a tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.Google ScholarCross Ref
- Aaron Carroll and Gernot Heiser. 2010. An Analysis of Power Consumption in a Smartphone. In Proc. USENIX ATC. Boston, MA, USA, 21:1--21:14.Google Scholar
- Mingshi Chen, Panlong Yang, Jie Xiong, Maotian Zhang, Youngki Lee, Chaocan Xiang, and Chang Tian. 2019. Your Table Can Be an Input Panel: Acoustic-based Device-Free Interaction Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1 (2019), 3:1--3:21.Google ScholarDigital Library
- S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang, J. Weng, L. Su, and A. Mohaisen. 2017. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. In Proc. IEEE ICDCS. 183--195.Google Scholar
- Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2011), 788--798.Google ScholarDigital Library
- Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.Google ScholarCross Ref
- G. R. Doddington. 1985. Speaker recognition---Identifying people by their voices. Proc. IEEE 73, 11 (1985), 1651--1664.Google ScholarCross Ref
- J.-L. Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2, 2 (1994), 291--298.Google ScholarCross Ref
- H. Gish and M. Schmidt. 1994. Text-independent speaker identification. IEEE Signal Processing Magazine 11, 4 (Oct 1994), 18--32.Google ScholarCross Ref
- Xavier Glorot, Antoine Bordes, Yoshua Bengio, Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2012. Deep Sparse Rectifier Neural Networks. In Proc. AISTATS'12. La Palma, Canary Islands, 315--323.Google Scholar
- Google. 2019. Google Home - Smart Speaker & Home Assistant. [Online]. Available: https://store.google.com/us/product/google_home. (2019).Google Scholar
- Google. 2019. Google Smart Lock. [Online]. Available: https://get.google.com/smartlock/. (2019).Google Scholar
- Diego Gragnaniello, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. 2015. Local contrast phase descriptor for fingerprint liveness detection. Pattern Recognition 48, 4 (2015), 1050--1058.Google ScholarDigital Library
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google Scholar
- Cesar Iovescu and Sandeep Rao. 2017. The fundamentals of millimeter wave sensors. Technical Report. Texas Instruments. http://www.ti.com/lit/wp/spyy005/spyy005.pdfGoogle Scholar
- Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030--3044.Google ScholarDigital Library
- Mark Keith, Benjamin Shao, and Paul John Steinbart. 2007. The usability of passphrases for authentication: An empirical field study. International journal of human-computer studies 65, 1 (2007), 17--28.Google Scholar
- HJ Landau. 1967. Sampling, data transmission, and the Nyquist rate. Proc. IEEE 55, 10 (1967), 1701--1706.Google ScholarCross Ref
- Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proc. IEEE ICASSP. Florence, Italy, 1695--1699.Google ScholarCross Ref
- Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. 2016. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proc. ACM CCS. Vienna, Austria, 1068--1079.Google ScholarDigital Library
- Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones. IEEE/ACM Transactions on Networking 27, 1 (2019), 447--460.Google ScholarDigital Library
- Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Yunfei Liu, and Minglu Li. 2018. LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1466--1474.Google ScholarCross Ref
- Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Minglu Li, and Xiangyu Xu. 2019. I3: Sensing Scrolling Human-Computer Interactions for Intelligent Interest Inference on Smartphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 97:1--97:22.Google ScholarDigital Library
- Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Xiangyu Xu, Guangtao Xue, and Minglu Li. 2019. KeyListener: Inferring Keystrokes on QWERTY Keyboard of Touch Screen through Acoustic Signals. In Proc. IEEE INFOCOM. Paris, France, 1--9.Google ScholarCross Ref
- Wenguang Mao, Jian He, and Lili Qiu. 2016. CAT: high-precision acoustic motion tracking. In Proc. ACM MobiCom. New York City, NY, USA, 69--81.Google ScholarDigital Library
- Wenguang Mao, Mei Wang, and Lili Qiu. 2018. AIM: Acoustic Imaging on a Mobile. In Proc. ACM MobiSys. Munich, Germany, 468--481.Google ScholarDigital Library
- Pavel Matějka, Ondřej Glembek, Fabio Castaldo, Md Jahangir Alam, Oldřich Plchot, Patrick Kenny, Lukáš Burget, and Jan Černocky. 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In Proc. IEEE ICASSP. Prague, Czech Republic, 4828--4831.Google ScholarCross Ref
- Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In Proc. ESORICS. Springer, Vienna, Austria, 599--621.Google Scholar
- A. Nagrani, J. S. Chung, and A. Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. In Proc. ISCA INTERSPEECH. Stockholm, Sweden, 2616--2620.Google Scholar
- Swadhin Pradhan, Ghufran Baig, Wenguang Mao, Lili Qiu, Guohai Chen, and Bo Yang. 2018. Smartphone-based Acoustic Indoor Space Mapping. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 75 (2018), 26 pages.Google ScholarDigital Library
- Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating Replay Attacks Against Voice Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 100:1--100:26.Google ScholarDigital Library
- K. Qian, C. Wu, F. Xiao, Y. Zheng, Y. Zhang, Z. Yang, and Y. Liu. 2018. Acousticcardiogram: Monitoring Heartbeats using Acoustic Signals on Smart Devices. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1574--1582.Google Scholar
- Douglas A. Reynolds. 1997. Comparison of Background Normalization Methods for Text-Independent Speaker Verification. In Proc. ISCA EUROSPEECH. Rhodes, Greece, 963--966.Google Scholar
- Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 1 (2000), 19--41.Google ScholarDigital Library
- Samsung. 2017. Iris recognition on Galaxy S8. [Online]. Available: https://www.samsung.com/au/iris/. (2017).Google Scholar
- Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proc. IEEE CVPR. Boston, MA, USA, 815--823.Google ScholarCross Ref
- Wei Shang and Maryhelen Stevenson. 2010. Score normalization in playback attack detection. In Proc. IEEE ICASSP. Dallas, Texas, USA, 1678--1681.Google ScholarCross Ref
- Sigurdur Sigurdsson, Kaare Brandt Petersen, and Tue Lehn-Schiøler. 2006. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proc. ISMIR. Victoria, Canada, 286--289.Google Scholar
- Merrill Ivan Skolnik. 1970. Radar handbook. McGraw-Hill, Incorporated, New York, NY, USA.Google Scholar
- Jiayao Tan, Cam-Tu Nguyen, and Xiaoliang Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In Proceedings of IEEE INFOCOM. IEEE, Atlanta, GA, USA, 1--9.Google ScholarCross Ref
- Jiayao Tan, Xiaoliang Wang, Cam-Tu Nguyen, and Yu Shi. 2018. SilentKey: A New Authentication Framework Through Ultrasonic-based Lip Reading. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1 (2018), 36:1--36:18.Google ScholarDigital Library
- Emanuel von Zezschwitz, Paul Dunphy, and Alexander De Luca. 2013. Patterns in the Wild: A Field Study of the Usability of Pattern and Pin-based Authentication on Mobile Devices. In Proc. ACM MobileHCI. Munich, Germany, 261--270.Google ScholarDigital Library
- Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW Based Contactless Respiration Detection Using Acoustic Signal. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4 (2018), 170:1--170:20.Google ScholarDigital Library
- Zhi-Feng Wang, Gang Wei, and Qian-Hua He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proc. IEEE ICMLC. Guilin, China, 1708--1713.Google ScholarCross Ref
- Wechat. 2015. Voiceprint: The New Wechat Password. [Online]. Available: https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password/. (2015).Google Scholar
- Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66 (2015), 130--153.Google ScholarDigital Library
- Xiangyu Xu, Hang Gao, Jiadi Yu, Yingying Chen, Yanmin Zhu, Guangtao Xue, and Minglu Li. 2017. ER: Early recognition of inattentive driving leveraging audio devices on smartphones. In Proc. IEEE INFOCOM. Atlanta, GA, USA, 1--9.Google ScholarCross Ref
- Xiangyu Xu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. In Proc. ACM MobiSys. Seoul, South Korea, 1--13.Google ScholarDigital Library
- Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint Based Spoofing Detection for Text-Independent Speaker Verification. In Proc. ACM CCS. London, United Kingdom, 1215--1229.Google ScholarDigital Library
- J. Yan, A. Blackwell, R. Anderson, and A. Grant. 2004. Password memorability and security: empirical results. IEEE Security Privacy 2, 5 (2004), 25--31.Google ScholarDigital Library
- Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proc. ACM MobiSys. Niagara Falls, NY, USA, 15--28.Google ScholarDigital Library
- Matthew D Zeiler, Graham W Taylor, Rob Fergus, et al. 2011. Adaptive deconvolutional networks for mid and high level feature learning. In Proc. IEEE ICCV. Barcelona, Spain, 2018--2025.Google ScholarDigital Library
- Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. In Proc. ACM CCS. Dallas, TX, USA, 57--71.Google ScholarDigital Library
- Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proc. ACM CCS. Vienna, Austria, 1080--1091.Google ScholarDigital Library
- Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao, Zhibo Wang, and Xiaofeng Chen. 2018. PatternListener: Cracking Android Pattern Lock Using Acoustic Signals. In Proc. ACM CCS. Toronto, Canada, 1775--1787.Google ScholarDigital Library
Index Terms
- VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones
Recommendations
Linguistic Effects Based Novel Filter for Hearing Aid to Deliver Natural Sound and Speech Clarity in Universal Environment
This paper presents a novel filter for hearing aid to improve the speech clarity and to deliver natural sound in universal environment existing aids cannot achieve reconfigurability to improve the performance of hearing aid. The sound pressure level ...
A hash-based strong-password authentication scheme without using smart cards
So far, many strong-password authentication schemes have been proposed, however, none is secure enough. In 2003, Lin, Shen, and Hwang proposed a strong-password authentication scheme using smart cards, and claimed that their scheme can resist the ...
What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice
The former Apple CEO Steve Jobs was one of the most charismatic speakers of the past decades. However, there is, as yet, no detailed quantitative profile of his way of speaking. We used state-of-the-art computer techniques to acoustically analyze his ...
Comments