ABSTRACT
With the continuing growth of voice-controlled devices, voice metrics have been widely used for user identification. However, voice biometrics is vulnerable to replay attacks and ambient noise. We identify that the fundamental vulnerability in voice biometrics is rooted in its indirect sensing modality (e.g., microphone). In this paper, we present VocalPrint, a resilient mmWave interrogation system which directly captures and analyzes the vocal vibrations for user authentication. Specifically, VocalPrint exploits the unique disturbance of the skin-reflect radio frequency (RF) signals around the near-throat region of the user, caused by the vocal vibrations during communication. The complex ambient noise is isolated from the RF signal using a novel resilience-aware clutter suppression approach for preserving fine-grained vocal biometric properties. Afterward, we extract the text-independent vocal tract and vocal source features and input them to an ensemble classifier for user authentication. VocalPrint is practical as it leverages a low-cost, portable, and energy-efficient hardware allowing effortless transition to a smartphone while having sufficient usability as typical voice authentication systems due to its non-contact nature. Our experimental results from 41 participants with different interrogation distances, orientations, and body motions show that VocalPrint can achieve over 96% authentication accuracy even under unfavorable conditions. We demonstrate the resilience of our system against complex noise interference and spoof attacks of various threat levels.
- Fadel Adib, Hongzi Mao, Zachary Kabelac, Dina Katabi, and Robert C Miller. 2015. Smart homes that monitor breathing and heart rate. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. ACM, 837--846.Google ScholarDigital Library
- M. L. Attiah, M. Ismail, R. Nordin, and N. F. Abdullah. 2015. Dynamic multi-state ultra-wideband mm-wave frequency selection for 5G communication. In 2015 IEEE 12th Malaysia International Conference on Communications (MICC). 219--224. Google ScholarCross Ref
- Leonard E Baum and John Alonzo Eagon. 1967. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Amer. Math. Soc. 73, 3 (1967), 360--363.Google ScholarCross Ref
- Logan Blue, Hadi Abdullah, Luis Vargas, and Patrick Traynor. 2018. 2ma: Verifying voice commands via two microphone authentication. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 89--100.Google ScholarDigital Library
- Rudolf Maarten Bolle, Jonathan Hudson Connell, and Nalini K Ratha. 2005. System and method for liveness authentication using an augmented challenge/response scheme. US Patent 6,851,051.Google Scholar
- Niko Brümmer and Edward De Villiers. 2013. The bosaris toolkit: Theory, algorithms and code for surviving the new dcf. arXiv preprint arXiv:1304.2865 (2013).Google Scholar
- Andrew Bud. 2018. Facing the future: The impact of Apple FaceID. Biometric Technology Today 2018, 1 (2018), 5--7.Google ScholarCross Ref
- Joseph P Campbell. 1997. Speaker recognition: A tutorial. Proc. IEEE 85, 9 (1997), 1437--1462.Google ScholarCross Ref
- William M Campbell, Joseph P Campbell, Douglas A Reynolds, Elliot Singer, and Pedro A Torres-Carrasquillo. 2006. Support vector machines for speaker and language recognition. Computer Speech & Language 20, 2--3 (2006), 210--229.Google ScholarCross Ref
- Si Chen, Kui Ren, Sixu Piao, Cong Wang, Qian Wang, Jian Weng, Lu Su, and Aziz Mohaisen. 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 183--195.Google ScholarCross Ref
- Jae-Hyun Choi, Jong-Hun Jang, and Jin-Eep Roh. 2015. Design of an FMCW radar altimeter for wide-range and low measurement error. IEEE Transactions on Instrumentation and Measurement 64, 12 (2015), 3517--3525.Google ScholarCross Ref
- Tarang Chugh, Kai Cao, and Anil K Jain. 2018. Fingerprint spoof buster: Use of minutiae-centered patches. IEEE Transactions on Information Forensics and Security 13, 9 (2018), 2190--2202.Google ScholarCross Ref
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.Google ScholarDigital Library
- Sharmistha Das and John HL Hansen. 2004. Detection of voice onset time (VOT) for unvoived stops (/p/,/t/,/k/) using the Teager energy operator (TEO) for automatic detection of accented English. In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004. Citeseer, 344--347.Google Scholar
- TK Das and KM Nahar. 2016. A voice identification system using hidden markov model. Indian Journal of Science and Technology 9, 4 (2016).Google ScholarCross Ref
- Mangesh S Deshpande and Raghunath S Holambe. 2008. Text-independent speaker identification using hidden Markov models. In 2008 First International Conference on Emerging Trends in Engineering and Technology. IEEE, 641--644.Google ScholarDigital Library
- Gunnar Fant. 1970. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations. Number 2. Walter de Gruyter.Google Scholar
- Huan Feng, Kassem Fawaz, and Kang G Shin. 2017. Continuous authentication for voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. 343--355.Google ScholarDigital Library
- J. Hasch, E. Topak, R. Schnabel, T. Zwick, R. Weigel, and C. Waldschmidt. 2012. Millimeter-Wave Technology for Automotive Radar Sensors in the 77 GHz Frequency Band. IEEE Transactions on Microwave Theory and Techniques 60, 3 (March 2012), 845--860. Google ScholarCross Ref
- Roger A Horn. 1990. The hadamard product. In Proc. Symp. Appl. Math, Vol. 40. 87--169.Google ScholarCross Ref
- Danoush Hosseinzadeh and Sridhar Krishnan. 2007. Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMs. In 2007 IEEE 9th Workshop on Multimedia Signal Processing. IEEE, 365--368.Google ScholarCross Ref
- Artur Janicki, Federico Alegre, and Nicholas Evans. 2016. An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks. Security and Communication Networks 9, 15 (2016), 3030--3044.Google ScholarDigital Library
- Wenjun Jiang, Chenglin Miao, Fenglong Ma, Shuochao Yao, Yaqing Wang, Ye Yuan, Hongfei Xue, Chen Song, Xin Ma, Dimitrios Koutsonikolas, et al. 2018. Towards Environment Independent Device Free Human Activity Recognition. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. ACM, 289--304.Google ScholarDigital Library
- Ossi Johannes Kaltiokallio, Hüseyin Yigitler, Riku Jäntti, and Neal Patwari. 2014. Non-invasive respiration rate monitoring using a single COTS TX-RX pair. In Proceedings of the 13th international symposium on Information processing in sensor networks. IEEE Press, 59--70.Google ScholarDigital Library
- James E Kelley, Jr. 1960. The cutting-plane method for solving convex programs. Journal of the society for Industrial and Applied Mathematics 8, 4 (1960), 703--712.Google ScholarCross Ref
- Lawrence George Kersta. 1962. Voiceprint identification. Nature 196, 4861 (1962), 1253--1257.Google Scholar
- Bernd J Kröger, Georg Schröder, and Claudia Opgen-Rhein. 1995. A gesture-based dynamic model describing articulatory movement data. The Journal of the Acoustical Society of America 98, 4 (1995), 1878--1889.Google ScholarCross Ref
- Jeffrey C Lagarias, James A Reeds, Margaret H Wright, and Paul E Wright. 1998. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM Journal on optimization 9, 1 (1998), 112--147.Google Scholar
- Selena Larson. 2017. Google Home now recognizes your individual voice. CNN Money, San Francisco, California 3 (2017).Google Scholar
- Changzhi Li, Victor M Lubecke, Olga Boric-Lubecke, and Jenshan Lin. 2013. A review on recent advances in Doppler radar sensors for noncontact healthcare monitoring. IEEE Transactions on microwave theory and techniques 61, 5 (2013), 2046--2060.Google ScholarCross Ref
- Penghua Li, Fangchao Hu, Yinguo Li, and Yang Xu. 2014. Speaker identification using linear predictive cepstral coefficients and general regression neural network. In Proceedings of the 33rd Chinese Control Conference. IEEE, 4952--4956.Google ScholarCross Ref
- Jaime Lien, Nicholas Gillian, M Emre Karagozler, Patrick Amihood, Carsten Schwesig, Erik Olson, Hakim Raja, and Ivan Poupyrev. 2016. Soli: Ubiquitous gesture sensing with millimeter wave radar. ACM Transactions on Graphics (TOG) 35, 4 (2016), 142.Google ScholarDigital Library
- C. Lin, S. Chang, C. Chang, and C. Lin. 2010. Microwave Human VocalVibration Signal Detection Based on Doppler Radar Technology. IEEE Transactions on Microwave Theory and Techniques 58, 8 (Aug 2010), 2299--2306. Google ScholarCross Ref
- Feng Lin, Chen Song, Yan Zhuang, Wenyao Xu, Changzhi Li, and Kui Ren. 2017. Cardiac scan: A non-contact and continuous heart-based user authentication system. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. ACM, 315--328.Google ScholarDigital Library
- Rui Liu, Cory Cornelius, Reza Rawassizadeh, Ronald Peterson, and David Kotz. 2018. Vocal resonance: Using internal body voice for wearable authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 19.Google ScholarDigital Library
- Bram Lohman, Olga Boric-Lubecke, VM Lubecke, PW Ong, and MM Sondhi. 2002. A digital signal processor for Doppler radar sensing of vital signs. IEEE Engineering in Medicine and Biology Magazine 21, 5 (2002), 161--164.Google ScholarCross Ref
- Judith A Markowitz. 2000. Voice biometrics. Commun. ACM 43, 9 (2000), 66--73.Google ScholarDigital Library
- Alvin F Martin and Mark A Przybocki. 2001. The NIST speaker recognition evaluations: 1996--2001. In 2001: A Speaker Odyssey-The Speaker Recognition Workshop.Google Scholar
- Jack McLaughlin, Douglas A Reynolds, and Terry Gleason. 1999. A study of computation speed-ups of the GMM-UBM speaker recognition system. In Sixth European Conference on Speech Communication and Technology.Google Scholar
- Ian Vince McLoughlin. 2008. Line spectral pairs. Signal processing 88, 3 (2008), 448--467.Google Scholar
- Yan Meng, Zichang Wang, Wei Zhang, Peilin Wu, Haojin Zhu, Xiaohui Liang, and Yao Liu. 2018. WiVo: Enhancing the Security of Voice Control System via Wireless Signal in IoT Environment. In Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 81--90.Google ScholarDigital Library
- K Sri Rama Murtty and Bayya Yegnanarayana. 2005. Combining evidence from residual phase and MFCC features for speaker recognition. IEEE signal processing letters 13, 1 (2005), 52--55.Google Scholar
- Seiichi Nakagawa, Kouhei Asakawa, and Longbiao Wang. 2007. Speaker recognition by combining MFCC and phase information. In Eighth annual conference of the international speech communication association.Google Scholar
- National Instruments [n.d.]. mmWave Transceiver System. http://www.ni.com/sdr/mmwave/Google Scholar
- NXP [n.d.]. S32R27 Reference Design Kit for high-performance Automotive Radar. https://www.nxp.com/products/power-management/system-basis-chips/functional-safety-sbcs/s32r27-reference-design-kit-for-high-performance-automotive-radar:RDK-S32R274Google Scholar
- J. D. Park and W. J. Kim. 2006. An Efficient Method of Eliminating the Range Ambiguity for a Low-Cost FMCW Radar Using VCO Tuning Characteristics. IEEE Transactions on Microwave Theory and Techniques 54, 10 (Oct 2006), 3623--3629. Google ScholarCross Ref
- Hemant A Patil and Pallavi N Baljekar. 2012. Classification of normal and pathological voices using TEO phase and Mel cepstral features. In 2012 International Conference on Signal Processing and Communications (SPCOM). IEEE, 1--5.Google ScholarCross Ref
- Douglas T Petkie, Erik Bryan, Carla Benton, and Brian D Rigling. 2009. Millimeter-wave radar systems for biometric applications. In Millimetre Wave and Terahertz Sensors and Technology II, Vol. 7485. International Society for Optics and Photonics, 748502.Google ScholarCross Ref
- Michael David Plumpe, Thomas F Quatieri, and Douglas A Reynolds. 1999. Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing 7, 5 (1999), 569--586.Google ScholarCross Ref
- Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.Google Scholar
- Jianwei Qian, Haohua Du, Jiahui Hou, Linlin Chen, Taeho Jung, and Xiang-Yang Li. 2018. Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. ACM, 82--94.Google ScholarDigital Library
- Alain Rakotomamonjy, Francis Bach, Stephane Canu, and Yves Grandvalet. 2007. More efficiency in multiple kernel learning. In Proceedings of the 24th international conference on Machine learning. 775--782.Google ScholarDigital Library
- Ravi P Ramachandran, Mihailo S Zilovic, and Richard J Mammone. 1995. A comparative study of robust linear predictive analysis methods with applications to speaker identification. IEEE transactions on speech and audio processing 3, 2 (1995), 117--125.Google ScholarCross Ref
- Douglas A Reynolds and Richard C Rose. 1995. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing 3, 1 (1995), 72--83.Google ScholarCross Ref
- Nirupam Roy and Romit Roy Choudhury. 2016. Listening through a vibration motor. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 57--69.Google ScholarDigital Library
- Syed Muhammad Saqlain, Muhammad Sher, Faiz Ali Shah, Imran Khan, Muhammad Usman Ashraf, Muhammad Awais, and Anwar Ghani. 2019. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowledge and Information Systems 58, 1 (2019), 139--167.Google ScholarDigital Library
- S. Scherr, S. Ayhan, B. Fischbach, A. Bhutani, M. Pauli, and T. Zwick. 2015. An Efficient Frequency and Phase Estimation Algorithm With CRB Performance for FMCW Radar Applications. IEEE Transactions on Instrumentation and Measurement 64, 7 (July 2015), 1868--1875. Google ScholarCross Ref
- Jiacheng Shang, Si Chen, and Jie Wu. 2018. Defending Against Voice Spoofing: A Robust Software-based Liveness Detection System. In 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). IEEE, 28--36.Google Scholar
- Jiacheng Shang, Si Chen, and Jie Wut. 2018. SRVoice: A Robust Sparse Representation-based Liveness Detection System. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 291--298.Google Scholar
- Robert V Shannon, Fan-Gang Zeng, Vivek Kamath, John Wygonski, and Michael Ekelid. 1995. Speech recognition with primarily temporal cues. Science 270, 5234 (1995), 303--304.Google Scholar
- Jan SilovskᏳ and Jan Nouza. 2006. Speech, speaker and speaker's gender identification in automatically processed broadcast stream. Radioengineering (2006).Google Scholar
- J Singh, B Ginsburg, S Rao, and K Ramasubramanian. 2017. AWR1642 mm-Wave sensor: 76--81-GHz radar-on-chip for short-range radar applications. Texas Instruments (2017), 1--7.Google Scholar
- Craig S. Smith. [n.d.]. Alexa and Siri Can Hear This Hidden Command. You Can't. (Published 2018). http://www.nytimes.com/2018/05/10/technology/alexa-siri-hidden-command-audio-attacks.htmlGoogle Scholar
- synopsys [n.d.]. High-Performance DSP and Control Processing for Complex 5G Requirements. https://www.synopsys.com/designware-ip/technical-bulletin/high-performance-dsp-for-5g-dwtb-q418.htmlGoogle Scholar
- Guochao Wang, Jose-Maria Munoz-Ferreras, Changzhan Gu, Changzhi Li, and Roberto Gómez-García. 2014. Application of linear-frequency-modulated continuous-wave (LFMCW) radars for tracking of vital signs. IEEE transactions on microwave theory and techniques 62, 6 (2014), 1387--1399.Google ScholarCross Ref
- Jianglin Wang. 2013. Physiologically-motivated feature extraction methods for speaker recognition. (2013).Google Scholar
- Qian Wang, Xiu Lin, Man Zhou, Yanjiao Chen, Cong Wang, Qi Li, and Xiangyang Luo. 2019. VoicePop: A pop noise based anti-spoofing system for voice authentication on smartphones. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2062--2070.Google ScholarDigital Library
- Teng Wei, Shu Wang, Anfu Zhou, and Xinyu Zhang. 2015. Acoustic eavesdropping through wireless vibrometry. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. ACM, 130--141.Google ScholarDigital Library
- Chenhan Xu, Zhengxiong Li, Hanbin Zhang, Aditya Singh Rathore, Huining Li, Chen Song, Kun Wang, and Wenyao Xu. 2019. WaveEar: Exploring a mmWave-based Noise-resistant Speech Sensing for Voice-User Interface. In Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 14--26.Google ScholarDigital Library
- Y. Xu, S. Wu, C. Chen, J. Chen, and G. Fang. 2012. A Novel Method for Automatic Detection of Trapped Victims by Ultrawideband Radar. IEEE Transactions on Geoscience and Remote Sensing 50, 8 (Aug 2012), 3132--3142. Google ScholarCross Ref
- Chen Yan, Yan Long, Xiaoyu Ji, and Wenyuan Xu. 2019. The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1215--1229.Google ScholarDigital Library
- Zhicheng Yang, Parth H Pathak, Yunze Zeng, Xixi Liran, and Prasant Mohapatra. 2016. Monitoring vital signs using millimeter wave. In Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. ACM, 211--220.Google ScholarDigital Library
- Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 49--64.Google Scholar
- Maxim Zhadobov, Nacer Chahat, Ronan Sauleau, Catherine Le Quement, and Yves Le Drean. 2011. Millimeter-wave interactions with the human body: state of knowledge and recent advances. International Journal of Microwave and Wireless Technologies 3, 2 (2011), 237âĂŞ247. Google ScholarCross Ref
- Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. Dolphinattack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 103--117.Google ScholarDigital Library
- Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing your voice is not enough: An articulatory gesture based liveness detection for voice authentication. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 57--71.Google ScholarDigital Library
- Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1080--1091.Google ScholarDigital Library
- Mingmin Zhao, Fadel Adib, and Dina Katabi. 2016. Emotion recognition using wireless signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 95--108.Google ScholarDigital Library
- Mingmin Zhao, Shichao Yue, Dina Katabi, Tommi S Jaakkola, and Matt T Bianchi. 2017. Learning sleep stages from radio signals: A conditional adversarial architecture. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 4100--4109.Google Scholar
- Bing Zhou, Jay Lohokare, Ruipeng Gao, and Fan Ye. 2018. EchoPrint: Two-factor Authentication using Acoustics and Vision on Smartphones. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. ACM, 321--336.Google ScholarDigital Library
Index Terms
- VocalPrint: exploring a resilient and secure voice authentication via mmWave biometric interrogation
Recommendations
Revisiting the Security of Biometric Authentication Systems Against Statistical Attacks
The uniqueness of behavioral biometrics (e.g., voice or keystroke patterns) has been challenged by recent works. Statistical attacks have been proposed that infer general population statistics and target behavioral biometrics against a particular victim. ...
Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityVoice biometrics is drawing increasing attention as it is a promising alternative to legacy passwords for mobile authentication. Recently, a growing body of work shows that voice biometrics is vulnerable to spoofing through replay attacks, where an ...
SUPERVOICE: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
ASIA CCS '22: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications SecurityVoice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or ...
Comments