research-article

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

Authors:
Li Lu

Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China

Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China
View Profile

,
Jiadi Yu

Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China

Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China
View Profile

,
Yingying Chen

Rutgers University, WINLAB and Department of Electrical and Computer Engineering, New Brunswick, NJ, USA

Rutgers University, WINLAB and Department of Electrical and Computer Engineering, New Brunswick, NJ, USA
View Profile

,
Yan Wang

Temple University, Department of Computer and Information Sciences, Philadelphia, PA, USA

Temple University, Department of Computer and Information Sciences, Philadelphia, PA, USA
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 4 Issue 2Article No.: 51pp 1–24https://doi.org/10.1145/3397320

Published:15 June 2020Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Recent years have witnessed the surge of biometric-based user authentication for mobile devices due to its promising security and convenience. As a natural and widely-existed behavior, human speaking has been exploited for user authentication. Existing voice-based user authentication explores the unique characteristics from either the voiceprint or mouth movements, which is vulnerable to replay attacks and mimic attacks. During speaking, the vocal tract, including the static shape and dynamic movements, also exhibits the individual uniqueness, and they are hardly eavesdropped and imitated by adversaries. Hence, our work aims to employ the individual uniqueness of vocal tract to realize user authentication on mobile devices. Moreover, most voice-based user authentications are passphrase-dependent, which significantly degrade the user experience. Thus, such user authentications are pressed to be implemented in a passphrase-independent manner while being able to resist various attacks. In this paper, we propose a user authentication system, VocalLock, which senses the whole vocal tract during speaking to identify different individuals in a passphrase-independent manner on smartphones leveraging acoustic signals. VocalLock first utilizes FMCW on acoustic signals to characterize both the static shape and dynamic movements of the vocal tract during speaking, and then constructs a passphrase-independent user authentication model based on the unique characteristics of vocal tract through GMM-UBM. The proposed VocalLock can resist various spoofing attacks, while achieving a satisfactory user experience. Extensive experiments in real environments demonstrate VocalLock can accurately authenticate user identity in a passphrase-independent manner and successfully resist various attacks.

References

Jont B Allen and Lawrence R Rabiner. 1977. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65, 11 (1977), 1558--1564.Google ScholarCross Ref
Amazon. 2019. Echo & Alexa - Amazon Device. [Online]. Available: https://www.amazon.com. (2019).Google Scholar
Apple. 2019. iPhone XS - FaceID - Apple. [Online]. Available: https://www.apple.com/iphone-xs/face-id/. (2019).Google Scholar
L. Benedikt, D. Cosker, P. L. Rosin, and D. Marshall. 2010. Assessing the Uniqueness and Permanence of Facial Actions for Use in Biometric Applications. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 3 (2010), 449--460.Google ScholarDigital Library
C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https://www.wordfrequency.info. (2020).Google Scholar
J. P. Campbell. 1997. Speaker recognition: a tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.Google ScholarCross Ref
Aaron Carroll and Gernot Heiser. 2010. An Analysis of Power Consumption in a Smartphone. In Proc. USENIX ATC. Boston, MA, USA, 21:1--21:14.Google Scholar
Mingshi Chen, Panlong Yang, Jie Xiong, Maotian Zhang, Youngki Lee, Chaocan Xiang, and Chang Tian. 2019. Your Table Can Be an Input Panel: Acoustic-based Device-Free Interaction Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1 (2019), 3:1--3:21.Google ScholarDigital Library
S. Chen, K. Ren, S. Piao, C. Wang, Q. Wang, J. Weng, L. Su, and A. Mohaisen. 2017. You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones. In Proc. IEEE ICDCS. 183--195.Google Scholar
Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2011), 788--798.Google ScholarDigital Library
Arthur P Dempster, Nan M Laird, and Donald B Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.Google ScholarCross Ref
G. R. Doddington. 1985. Speaker recognition---Identifying people by their voices. Proc. IEEE 73, 11 (1985), 1651--1664.Google ScholarCross Ref
J.-L. Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2, 2 (1994), 291--298.Google ScholarCross Ref
H. Gish and M. Schmidt. 1994. Text-independent speaker identification. IEEE Signal Processing Magazine 11, 4 (Oct 1994), 18--32.Google ScholarCross Ref
Xavier Glorot, Antoine Bordes, Yoshua Bengio, Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2012. Deep Sparse Rectifier Neural Networks. In Proc. AISTATS'12. La Palma, Canary Islands, 315--323.Google Scholar
Google. 2019. Google Home - Smart Speaker & Home Assistant. [Online]. Available: https://store.google.com/us/product/google_home. (2019).Google Scholar
Google. 2019. Google Smart Lock. [Online]. Available: https://get.google.com/smartlock/. (2019).Google Scholar
Diego Gragnaniello, Giovanni Poggi, Carlo Sansone, and Luisa Verdoliva. 2015. Local contrast phase descriptor for fingerprint liveness detection. Pattern Recognition 48, 4 (2015), 1050--1058.Google ScholarDigital Library
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google Scholar
Cesar Iovescu and Sandeep Rao. 2017. The fundamentals of millimeter wave sensors. Technical Report. Texas Instruments. http://www.ti.com/lit/wp/spyy005/spyy005.pdfGoogle Scholar
Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030--3044.Google ScholarDigital Library
Mark Keith, Benjamin Shao, and Paul John Steinbart. 2007. The usability of passphrases for authentication: An empirical field study. International journal of human-computer studies 65, 1 (2007), 17--28.Google Scholar
HJ Landau. 1967. Sampling, data transmission, and the Nyquist rate. Proc. IEEE 55, 10 (1967), 1701--1706.Google ScholarCross Ref
Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proc. IEEE ICASSP. Florence, Italy, 1695--1699.Google ScholarCross Ref
Mengyuan Li, Yan Meng, Junyi Liu, Haojin Zhu, Xiaohui Liang, Yao Liu, and Na Ruan. 2016. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proc. ACM CCS. Vienna, Austria, 1068--1079.Google ScholarDigital Library
Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones. IEEE/ACM Transactions on Networking 27, 1 (2019), 447--460.Google ScholarDigital Library
Li Lu, Jiadi Yu, Yingying Chen, Hongbo Liu, Yanmin Zhu, Yunfei Liu, and Minglu Li. 2018. LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1466--1474.Google ScholarCross Ref
Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Minglu Li, and Xiangyu Xu. 2019. I3: Sensing Scrolling Human-Computer Interactions for Intelligent Interest Inference on Smartphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 97:1--97:22.Google ScholarDigital Library
Li Lu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Xiangyu Xu, Guangtao Xue, and Minglu Li. 2019. KeyListener: Inferring Keystrokes on QWERTY Keyboard of Touch Screen through Acoustic Signals. In Proc. IEEE INFOCOM. Paris, France, 1--9.Google ScholarCross Ref
Wenguang Mao, Jian He, and Lili Qiu. 2016. CAT: high-precision acoustic motion tracking. In Proc. ACM MobiCom. New York City, NY, USA, 69--81.Google ScholarDigital Library
Wenguang Mao, Mei Wang, and Lili Qiu. 2018. AIM: Acoustic Imaging on a Mobile. In Proc. ACM MobiSys. Munich, Germany, 468--481.Google ScholarDigital Library
Pavel Matějka, Ondřej Glembek, Fabio Castaldo, Md Jahangir Alam, Oldřich Plchot, Patrick Kenny, Lukáš Burget, and Jan Černocky. 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In Proc. IEEE ICASSP. Prague, Czech Republic, 4828--4831.Google ScholarCross Ref
Dibya Mukhopadhyay, Maliheh Shirvanian, and Nitesh Saxena. 2015. All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In Proc. ESORICS. Springer, Vienna, Austria, 599--621.Google Scholar
A. Nagrani, J. S. Chung, and A. Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. In Proc. ISCA INTERSPEECH. Stockholm, Sweden, 2616--2620.Google Scholar
Swadhin Pradhan, Ghufran Baig, Wenguang Mao, Lili Qiu, Guohai Chen, and Bo Yang. 2018. Smartphone-based Acoustic Indoor Space Mapping. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 75 (2018), 26 pages.Google ScholarDigital Library
Swadhin Pradhan, Wei Sun, Ghufran Baig, and Lili Qiu. 2019. Combating Replay Attacks Against Voice Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 3 (2019), 100:1--100:26.Google ScholarDigital Library
K. Qian, C. Wu, F. Xiao, Y. Zheng, Y. Zhang, Z. Yang, and Y. Liu. 2018. Acousticcardiogram: Monitoring Heartbeats using Acoustic Signals on Smart Devices. In Proc. IEEE INFOCOM. Honolulu, HI, USA, 1574--1582.Google Scholar
Douglas A. Reynolds. 1997. Comparison of Background Normalization Methods for Text-Independent Speaker Verification. In Proc. ISCA EUROSPEECH. Rhodes, Greece, 963--966.Google Scholar
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 1 (2000), 19--41.Google ScholarDigital Library
Samsung. 2017. Iris recognition on Galaxy S8. [Online]. Available: https://www.samsung.com/au/iris/. (2017).Google Scholar
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proc. IEEE CVPR. Boston, MA, USA, 815--823.Google ScholarCross Ref
Wei Shang and Maryhelen Stevenson. 2010. Score normalization in playback attack detection. In Proc. IEEE ICASSP. Dallas, Texas, USA, 1678--1681.Google ScholarCross Ref
Sigurdur Sigurdsson, Kaare Brandt Petersen, and Tue Lehn-Schiøler. 2006. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proc. ISMIR. Victoria, Canada, 286--289.Google Scholar
Merrill Ivan Skolnik. 1970. Radar handbook. McGraw-Hill, Incorporated, New York, NY, USA.Google Scholar
Jiayao Tan, Cam-Tu Nguyen, and Xiaoliang Wang. 2017. SilentTalk: Lip reading through ultrasonic sensing on mobile phones. In Proceedings of IEEE INFOCOM. IEEE, Atlanta, GA, USA, 1--9.Google ScholarCross Ref
Jiayao Tan, Xiaoliang Wang, Cam-Tu Nguyen, and Yu Shi. 2018. SilentKey: A New Authentication Framework Through Ultrasonic-based Lip Reading. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1 (2018), 36:1--36:18.Google ScholarDigital Library
Emanuel von Zezschwitz, Paul Dunphy, and Alexander De Luca. 2013. Patterns in the Wild: A Field Study of the Usability of Pattern and Pin-based Authentication on Mobile Devices. In Proc. ACM MobileHCI. Munich, Germany, 261--270.Google ScholarDigital Library
Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou, and Bernadette Dorizzi. 2018. C-FMCW Based Contactless Respiration Detection Using Acoustic Signal. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4 (2018), 170:1--170:20.Google ScholarDigital Library
Zhi-Feng Wang, Gang Wei, and Qian-Hua He. 2011. Channel pattern noise based playback attack detection algorithm for speaker recognition. In Proc. IEEE ICMLC. Guilin, China, 1708--1713.Google ScholarCross Ref
Wechat. 2015. Voiceprint: The New Wechat Password. [Online]. Available: https://blog.wechat.com/2015/05/21/voiceprint-the-new-wechat-password/. (2015).Google Scholar
Zhizheng Wu, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi, Federico Alegre, and Haizhou Li. 2015. Spoofing and countermeasures for speaker verification: A survey. Speech Communication 66 (2015), 130--153.Google ScholarDigital Library
Xiangyu Xu, Hang Gao, Jiadi Yu, Yingying Chen, Yanmin Zhu, Guangtao Xue, and Minglu Li. 2017. ER: Early recognition of inattentive driving leveraging audio devices on smartphones. In Proc. IEEE INFOCOM. Atlanta, GA, USA, 1--9.Google ScholarCross Ref
Xiangyu Xu, Jiadi Yu, Yingying Chen, Yanmin Zhu, Linghe Kong, and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. In Proc. ACM MobiSys. Seoul, South Korea, 1--13.Google ScholarDigital Library
Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint Based Spoofing Detection for Text-Independent Speaker Verification. In Proc. ACM CCS. London, United Kingdom, 1215--1229.Google ScholarDigital Library
J. Yan, A. Blackwell, R. Anderson, and A. Grant. 2004. Password memorability and security: empirical results. IEEE Security Privacy 2, 5 (2004), 25--31.Google ScholarDigital Library
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proc. ACM MobiSys. Niagara Falls, NY, USA, 15--28.Google ScholarDigital Library
Matthew D Zeiler, Graham W Taylor, Rob Fergus, et al. 2011. Adaptive deconvolutional networks for mid and high level feature learning. In Proc. IEEE ICCV. Barcelona, Spain, 2018--2025.Google ScholarDigital Library
Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. In Proc. ACM CCS. Dallas, TX, USA, 57--71.Google ScholarDigital Library
Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proc. ACM CCS. Vienna, Austria, 1080--1091.Google ScholarDigital Library
Man Zhou, Qian Wang, Jingxiao Yang, Qi Li, Feng Xiao, Zhibo Wang, and Xiaofeng Chen. 2018. PatternListener: Cracking Android Pattern Lock Using Acoustic Signals. In Proc. ACM CCS. Toronto, Canada, 1775--1787.Google ScholarDigital Library

Index Terms

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones
1. Human-centered computing
  1. Ubiquitous and mobile computing
2. Security and privacy
  1. Security services
    1. Authentication

Recommendations

Linguistic Effects Based Novel Filter for Hearing Aid to Deliver Natural Sound and Speech Clarity in Universal Environment

This paper presents a novel filter for hearing aid to improve the speech clarity and to deliver natural sound in universal environment existing aids cannot achieve reconfigurability to improve the performance of hearing aid. The sound pressure level ...
Read More
A hash-based strong-password authentication scheme without using smart cards

So far, many strong-password authentication schemes have been proposed, however, none is secure enough. In 2003, Lin, Shen, and Hwang proposed a strong-password authentication scheme using smart cards, and claimed that their scheme can resist the ...
Read More
What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice

The former Apple CEO Steve Jobs was one of the most charismatic speakers of the past decades. However, there is, as yet, no detailed quantitative profile of his way of speaking. We used state-of-the-art computer techniques to acoustically analyze his ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 4, Issue 2
June 2020
771 pages
EISSN:2474-9567
DOI:10.1145/3406789
Issue’s Table of Contents

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2020
Published in imwut Volume 4, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FMCW
User authentication
acoustic signal
passphrase-independent
vocal-tract behavior
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 802
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Linguistic Effects Based Novel Filter for Hearing Aid to Deliver Natural Sound and Speech Clarity in Universal Environment

A hash-based strong-password authentication scheme without using smart cards

What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

VocalLock: Sensing Vocal Tract for Passphrase-Independent User Authentication Leveraging Acoustic Signals on Smartphones

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

References

Cited By

Index Terms

Recommendations

Linguistic Effects Based Novel Filter for Hearing Aid to Deliver Natural Sound and Speech Clarity in Universal Environment

A hash-based strong-password authentication scheme without using smart cards

What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of Steve Jobs tone of voice

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media