skip to main content
10.1145/3427228.3427259acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article
Public Access

WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training

Published: 08 December 2020 Publication History

Abstract

Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are vulnerable to various security and privacy leakages (e.g., credit card numbers, passwords), especially when issuing critical user commands involving large purchases, critical calls, etc. Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic-based attacks (e.g., impersonation, replay, and hidden command attacks). In this work, we propose a training-free voice authentication system, WearID, leveraging the cross-domain speech similarity between the audio domain and the vibration domain to provide enhanced security to the ever-growing deployment of VA systems. In particular, when a user gives a critical command, WearID exploits motion sensors on the user’s wearable device to capture the aerial speech in the vibration domain and verify it with the speech captured in the audio domain via the VA device’s microphone. Compared to existing approaches, our solution is low-effort and privacy-preserving, as it neither requires users’ active inputs (e.g., replying messages/calls) nor to store users’ privacy-sensitive voice samples for training. In addition, our solution exploits the distinct vibration sensing interface and its short sensing range to sound (e.g., 25cm) to verify voice commands. Examining the similarity of the two domains’ data is not trivial. The huge sampling rate gap (e.g., 8000Hz vs. 200Hz) between the audio and vibration domains makes it hard to compare the two domains’ data directly, and even tiny data noises could be magnified and cause authentication failures. To address the challenges, we investigate the complex relationship between the two sensing domains and develop a spectrogram-based algorithm to convert the microphone data into the lower-frequency “ motion sensor data” to facilitate cross-domain comparisons. We further develop a user authentication scheme to verify that the received voice command originates from the legitimate user based on the cross-domain speech similarity of the received voice commands. We report on extensive experiments to evaluate the WearID under various audible and inaudible attacks. The results show WearID can verify voice commands with 99.8% accuracy in the normal situation and detect 97.2% fake voice commands from various attacks, including impersonation/replay attacks and hidden voice/ultrasound attacks.

References

[1]
2015. Wearable ID: Is it a fit for your campus?https://www.cr80news.com/news-item/wearable-id-is-it-a-fit-for-your-campus/.
[2]
2016. Hidden Voice Commands Example. http://www.hiddenvoicecommands.com/white-box.
[3]
Amazon. 2020. Alexa Uses Voice Profiles to Recognize Your Voice and Personalize Your Experience. https://www.amazon.com/gp/help/customer/display.html?nodeId=202199440.
[4]
S Abhishek Anand and Nitesh Saxena. 2011. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. (2011).
[5]
S Abhishek Anand, Chen Wang, Jian Liu, Nitesh Saxena, and Yingying Chen. 2019. Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972(2019).
[6]
Les Atlas and Shihab A Shamma. 2003. Joint acoustic and modulation frequency. EURASIP Journal on Applied Signal Processing 2003 (2003), 668–675.
[7]
Anna Attkisson. 2016. Siri vs. Alexa: Why Amazon Won Our 300-Question Showdown. https://www.tomsguide.com/us/siri-vs-alexa, review-3681.html.
[8]
Android Authority. 2020. Google Home and Assistant commands – here’s the ones you need to know. https://www.androidauthority.com/google-assistant-commands-727911/.
[9]
JenniferE Bellemare. 2018. Consumers Need Answers to Amazon Echo Privacy Concerns. https://www.identityforce.com/blog/amazon-echo-privacy-concerns.
[10]
Logan Blue, Hadi Abdullah, Luis Vargas, and Patrick Traynor. 2018. 2MA: Verifying Voice Commands via Two Microphone Authentication. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM, 89–100.
[11]
Joseph P Campbell. 1997. Speaker recognition: A tutorial. Proc. IEEE 85, 9 (1997), 1437–1462.
[12]
Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden Voice Commands. In USENIX Security Symposium. 513–530.
[13]
Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 1–7.
[14]
Si Chen, Kui Ren, Sixu Piao, Cong Wang, Qian Wang, Jian Weng, Lu Su, and Aziz Mohaisen. 2017. You can hear but you cannot steal: Defending against voice impersonation attacks on smartphones. In Distributed Computing Systems (ICDCS), 2017 IEEE 37th International Conference on. IEEE, 183–195.
[15]
Geumhwan Cho, Jusop Choi, Hyoungshick Kim, Sangwon Hyun, and Jungwoo Ryoo. 2018. Threat modeling and analysis of voice assistant applications. In International Workshop on Information Security Applications. Springer, 197–209.
[16]
Kirsten Crager, Anindya Maiti, Murtuza Jadliwala, and Jibo He. 2017. Information leakage through mobile motion sensors: User awareness and concerns. In Proceedings of the European Workshop on Usable Security (EuroUSEC).
[17]
Phillip L De Leon, Michael Pucher, and Junichi Yamagishi. 2012. Evaluation of the vulnerability of speaker verification to synthetic speech. IEEE Transactions on Audio, Speech, and Language Processing 20 (2012), 2280 – 2290.
[18]
Pyramid Electronics. 2018. Pyramid Car Audio, 300 Watt Aluminum Bullet Horn in Enclosure with Swivel Housing. http://www.pyramidcaraudio.com/sku/TW28/300-Watt-Aluminum-Bullet-Horn-in-Enclosure-wSwivel-Housing.
[19]
Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin, and David Wagner. 2012. Android permissions: User attention, comprehension, and behavior. In Proceedings of the eighth symposium on usable privacy and security. ACM, 3.
[20]
Huan Feng, Kassem Fawaz, and Kang G Shin. 2017. Continuous authentication for voice assistants. In Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking. ACM, 343–355.
[21]
Google. 2019. How you sign in with 2-Step Verification. https://support.google.com/accounts/answer/1085463?hl=en.
[22]
Google. 2020. Voice Match and media on Google Nest and Google Home speakers and displays. https://support.google.com/googlenest/answer/7342711?hl=en.
[23]
Tzipora Halevi, Di Ma, Nitesh Saxena, and Tuo Xiang. 2012. Secure proximity detection for NFC devices based on ambient sensor data. In European Symposium on Research in Computer Security. 379–396.
[24]
Jun Han, Albert Jin Chung, and Patrick Tague. 2017. Pitchln: eavesdropping via intelligible speech reconstruction using non-acoustic sensor fusion. In Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks. 181–192.
[25]
Matthieu Hébert. 2008. Text-dependent speaker recognition. In Springer handbook of speech processing. Springer, 743–762.
[26]
Apple IOS. 2019. Siri. https://www.apple.com/ios/siri/.
[27]
Dean M Karantonis, Michael R Narayanan, Merryn Mathie, Nigel H Lovell, and Branko G Celler. 2006. Implementation of a real-time human movement classifier using a triaxial accelerometer for ambulatory monitoring. IEEE transactions on information technology in biomedicine 10, 1(2006), 156–167.
[28]
Tomi Kinnunen, Bingjun Zhang, Jia Zhu, and Ye Wang. 2007. Speaker verification with adaptive spectral subband centroids. In International Conference on Biometrics. Springer, 58–66.
[29]
John Krumm and Ken Hinckley. 2004. The nearme wireless proximity server. In International Conference on Ubiquitous Computing. 283–300.
[30]
Johan Lindberg and Mats Blomberg. 1999. Vulnerability in speaker verification-a study of technical impostor techniques. In Sixth European Conference on Speech Communication and Technology.
[31]
Logitech. 2018. Logitech S120 speaker. https://www.logitech.com/en-us/product/s120-stereo-speakers.
[32]
Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In USENIX Security Symposium. 1053–1067.
[33]
K Sri Rama Murty and Bayya Yegnanarayana. 2006. Combining evidence from residual phase and MFCC features for speaker recognition. IEEE signal processing letters 13, 1 (2006), 52–55.
[34]
Murray Newlands. 2017. THE TOP WEARABLE PAYMENT TECHNOLOGY. https://due.com/blog/wearable-payment-technology/.
[35]
Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. arXiv preprint arXiv:1903.10346(2019).
[36]
Douglas A Reynolds and Richard C Rose. 1995. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing 3, 1 (1995), 72–83.
[37]
Duo Security. 2019. Secure Authentication With the Duo Mobile App. https://duo.com/product/multi-factor-authentication-mfa/duo-mobile-app.
[38]
Dave Singelee and Bart Preneel. 2005. Location verification using secure distance bounding protocols. In IEEE International Conference on Mobile Adhoc and Sensor Systems Conference. 7–pp.
[39]
David Snyder, Daniel Garcia-Romero, Daniel Povey, and Sanjeev Khudanpur. 2017. Deep Neural Network Embeddings for Text-Independent Speaker Verification. In Interspeech. 999–1003.
[40]
Statista. 2018. Number of connected wearable devices worldwide from 2016 to 2021. https://www.statista.com/statistics/487291/global-connected-wearable-devices/.
[41]
Keysight Technologies. 2018. Keysight Technologies 33509B. https://www.alliedelec.com/keysight-technologies-33509b.
[42]
Roberto Togneri and Daniel Pullella. 2011. An overview of speaker identification: Accuracy and robustness issues. IEEE circuits and systems magazine 11, 2 (2011), 23–61.
[43]
Niall Twomey, Tom Diethe, Xenofon Fafoutis, Atis Elsts, Ryan McConville, Peter Flach, and Ian Craddock. 2018. A comprehensive study of activity recognition using accelerometers. In Informatics, Vol. 5. Multidisciplinary Digital Publishing Institute, 27.
[44]
Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, and Javier Gonzalez-Dominguez. 2014. Deep neural networks for small footprint text-dependent speaker verification. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4052–4056.
[45]
Kathryn Vasel. 2015. How your voice can protect you from credit card fraud. https://money.cnn.com/2015/11/02/pf/voice-biometrics-customer-fraud/index.html.
[46]
Chen Wang, S Abhishek Anand, Jian Liu, Payton Walker, Yingying Chen, and Nitesh Saxena. 2019. Defeating hidden audio channel attacks on voice assistants via audio-induced surface vibrations. In Proceedings of the 35th Annual Computer Security Applications Conference. 42–56.
[47]
Xiaohui Wang, Yanjing Wu, and Wenyuan Xu. 2016. WindCompass: Determine Wind Direction Using Smartphones. In Sensing, Communication, and Networking (SECON), 2016 13th Annual IEEE International Conference on. IEEE, 1–9.
[48]
WeChat. 2017. Voiceprint. https://thenextweb.com/apps/2015/03/25/wechat-on-ios-now-lets-you-log-in-using-just-your-voice/.
[49]
Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 49–64.
[50]
Hossein Zeinali, Lukáš Burget, Jan Černockỳ, 2019. A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database. arXiv preprint arXiv:1912.03627(2019).
[51]
Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. 2017. DolphinAttack: Inaudible voice commands. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 103–117.
[52]
Li Zhang, Parth H Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. 2015. Accelword: Energy efficient hotword detection through accelerometer. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 301–315.
[53]
Linghan Zhang, Sheng Tan, and Jie Yang. 2017. Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 57–71.
[54]
Linghan Zhang, Sheng Tan, Jie Yang, and Yingying Chen. 2016. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1080–1091.

Cited By

View all
  • (2024)User Authentication in the IoT and IIoT EnvironmentSmart and Agile Cybersecurity for IoT and IIoT Environments10.4018/979-8-3693-3451-5.ch008(169-194)Online publication date: 30-Jun-2024
  • (2024)SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration SignaturesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670358(153-167)Online publication date: 2-Dec-2024
  • (2024)TouchTone: Smartwatch Privacy Protection via Unobtrusive Finger Touch GesturesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661884(141-154)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. WearID: Low-Effort Wearable-Assisted Authentication of Voice Commands via Cross-Domain Comparison without Training
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference
      December 2020
      962 pages
      ISBN:9781450388580
      DOI:10.1145/3427228
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 December 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Motion Sensor
      2. User Authentication
      3. Voice Assistant Systems

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ACSAC '20

      Acceptance Rates

      Overall Acceptance Rate 104 of 497 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)186
      • Downloads (Last 6 weeks)30
      Reflects downloads up to 30 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)User Authentication in the IoT and IIoT EnvironmentSmart and Agile Cybersecurity for IoT and IIoT Environments10.4018/979-8-3693-3451-5.ch008(169-194)Online publication date: 30-Jun-2024
      • (2024)SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration SignaturesProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670358(153-167)Online publication date: 2-Dec-2024
      • (2024)TouchTone: Smartwatch Privacy Protection via Unobtrusive Finger Touch GesturesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661884(141-154)Online publication date: 3-Jun-2024
      • (2024)Accuth+: Accelerometer-Based Anti-Spoofing Voice Authentication on Wrist-Worn WearablesIEEE Transactions on Mobile Computing10.1109/TMC.2023.331483723:5(5571-5588)Online publication date: May-2024
      • (2023)Smartphone-Key: Hands-Free Two-Factor Authentication for Voice-Controlled Devices Using Wi-Fi LocationIEEE Transactions on Network and Service Management10.1109/TNSM.2023.324508020:3(3848-3864)Online publication date: Sep-2023
      • (2023)Cross-Modality Continuous User Authentication and Device Pairing With Respiratory PatternsIEEE Internet of Things Journal10.1109/JIOT.2023.327509910:16(14197-14211)Online publication date: 15-Aug-2023
      • (2023)The Passport: A Single-Node Authentication System for Heterogeneous Voice-Controlled IoT Networks2023 International Wireless Communications and Mobile Computing (IWCMC)10.1109/IWCMC58020.2023.10183058(1597-1604)Online publication date: 19-Jun-2023
      • (2023)Revisiting the Deep Learning-Based Eavesdropping Attacks via Facial Dynamics from VR Motion SensorsInformation and Communications Security10.1007/978-981-99-7356-9_24(399-417)Online publication date: 18-Nov-2023
      • (2022)AccuthProceedings of the 20th ACM Conference on Embedded Networked Sensor Systems10.1145/3560905.3568522(637-650)Online publication date: 6-Nov-2022
      • (2022)Defending against Thru-barrier Stealthy Voice Attacks via Cross-Domain Sensing on Phoneme Sounds2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00071(680-690)Online publication date: Jul-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media