ABSTRACT
It is estimated that by the year 2024, the total number of systems equipped with voice assistant software will exceed 8.4 billion devices globally. While these devices provide convenience to consumers, they suffer from a myriad of security issues. This paper highlights the serious privacy threats exposed by information leakage in a smart assistant's encrypted network traffic metadata. To investigate this issue, we have collected a new dataset composed of dynamic and static commands posed to an Amazon Echo Dot using data collection and cleaning scripts we developed.
Furthermore, we propose the Smart Home Assistant Malicious Ensemble model (SHAME) as the new state-of-the-art Voice Command Fingerprinting classifier. When evaluated against several datasets, our attack correctly classifies encrypted voice commands with up to 99.81% accuracy on Google Home traffic and 95.2% accuracy on Amazon Echo Dot traffic. These findings show that security measures must be taken to stop internet service providers, nation-states, and network eavesdroppers from monitoring our intimate conversations.
Supplemental Material
- ACK-J. 2021. SHAME Model, Fingerprinting Smart Assistants (GitHub). https: //github.com/ACK-J/SHAME_Model_Fingerprinting_Smart_AssistantsGoogle Scholar
- Caio A.P. Burgardt Antônio J. Pinheiro, Jeandro de M. Bezerra and Divanilson R. Campelo. [n. d.]. Identifying IoT devices and events based on packet length from encrypted traffic, Computer Communications.Google Scholar
- Noah Apthorpe, Dillon Reisman, Srikanth Sundaresan, Arvind Narayanan, and Nick Feamster. 2017. Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT Traffic. CoRR abs/1708.05044 (2017). arXiv:1708.05044 http: //arxiv.org/abs/1708.05044Google Scholar
- Sanjit Bhat, David Lu, Albert Kwon, and Srinivas Devadas. 2019. Var-CNN: A Data- Efficient Website Fingerprinting Attack Based on Deep Learning. Proceedings on Privacy Enhancing Technologies 2019, 4 (2019), 292--310.Google ScholarCross Ref
- Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In 25th USENIX Security Symposium (USENIX Security '16). 513--530.Google Scholar
- Batyr Charyyev and Mehmet Hadi Gunes. 2020. IoT Event Classification Based on Network Traffic. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE. https://doi.org/10.1109/ infocomwkshps50562.2020.9162885Google Scholar
- Chenggang Wang et al. 2020. DeepVC Alexa and Google Home Pcap Datasets. https://drive.google.com/drive/folders/1l- fSX9VdZH5kF9z7gm82xgYX5ca0kRI0?usp=sharingGoogle Scholar
- festvox. 2021. Flite: A small run-time speech synthesis engine (Github). https: //github.com/festvox/fliteGoogle Scholar
- Jamie Hayes and George Danezis. 2016. k-fingerprinting: A robust scalable website fingerprinting technique. In USENIX Security Symposium. USENIX Association, 1--17.Google Scholar
- Jack Hyland. 2021. Exploiting Amazon Alexa Using the SHAME Model POC Video. https://drive.google.com/file/d/1nMd7PYX6JGB4ESqGlNlwv0fnka9_ QMOH/view?usp=sharingGoogle Scholar
- Jack Hyland. 2021. SHAME Dynamic vs Static Dataset. https://drive.google.com/file/d/1K19SDZ3IdvAv_0rK6mG9d8WTpHg85gzV/view?usp=sharingGoogle Scholar
- Marc Juarez, Sadia Afroz, Gunes Acar, Claudia Diaz, and Rachel Greenstadt. 2014. A Critical Evaluation of Website Fingerprinting Attacks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (Scottsdale, Arizona, USA) (CCS '14). Association for Computing Machinery, New York, NY, USA, 263--274. https://doi.org/10.1145/2660267.2660368Google ScholarDigital Library
- Sean Kennedy, Haipeng Li, Chenggang Wang, Hao Liu, Boyang Wang, and Wenhai Sun. 2019. I Can Hear Your Alexa: Voice Command Fingerprinting on Smart Home Speakers. In 2019 IEEE Conference on Communications and Network Security (CNS). IEEE. https://doi.org/10.1109/cns.2019.8802686Google ScholarCross Ref
- Marc Liberatore and Brian Levine. 2006. Inferring the source of encrypted HTTP connections. 255--263. https://doi.org/10.1145/1180405.1180437Google ScholarDigital Library
- Yanyan Lit, Sara Kim, and Eric Sy. 2021. A Survey on Amazon Alexa Attack Surfaces. In 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--7. https://doi.org/10.1109/ccnc49032.2021.9369553Google ScholarDigital Library
- Jouni Malinen. 2021. hostapd: IEEE 802.11 AP, IEEE 802.1X WPA/WPA2/EAP/RADIUS Authenticator. https://w1.fi/hostapd/Google Scholar
- Mozilla. 2021. Project DeepSpeech (Github). https://github.com/mozilla/ DeepSpeechGoogle Scholar
- Andriy Panchenko, Fabian Lanze, Andreas Zinnen, Martin Henze, Jan Pennekamp, Klaus Wehrle, and Thomas Engel. 2016. Website Fingerprinting at Internet Scale. In Proceedings 2016 Network and Distributed System Security Symposium. Internet Society. https://doi.org/10.14722/ndss.2016.23477Google ScholarCross Ref
- Mohammad Saidur Rahman, Mohsen Imani, Nate Mathews, and Matthew Wright. 2020. Mockingbird: Defending against deep-learning-based website fingerprinting attacks with adversarial traces. IEEE Transactions on Information Forensics and Security 16 (2020), 1594--1609.Google ScholarDigital Library
- Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter Joosen. 2018. Automated Website Fingerprinting through Deep Learning. In Network and Distributed System Security Symposium (NDSS). Internet Society.Google ScholarCross Ref
- Nirupam Roy, Sheng Shen, Haitham Hassanieh, and Romit Roy Choudhury. 2018. Inaudible voice commands: The long-range attack and defense. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI '18). 547--560.Google Scholar
- Vitaly Shmatikov and Ming-Hsiu Wang. 2006. Timing Analysis in Low-Latency Mix Networks: Attacks and Defenses. In Computer Security -- ESORICS 2006. Springer Berlin Heidelberg, 18--33. https://doi.org/10.1007/11863908_2Google ScholarCross Ref
- Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018. Deep Fingerprinting. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM. https://doi.org/10.1145/3243734.3243768Google ScholarDigital Library
- Payap Sirinam, Nate Mathews, Mohammad Saidur Rahman, and Matthew Wright. 2019. Triplet Fingerprinting: More Practical and Portable Website Fingerprinting with N-Shot Learning. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom) (CCS '19). Association for Computing Machinery, New York, NY, USA, 1131--1148. https: //doi.org/10.1145/3319535.3354217Google ScholarDigital Library
- Arunan Sivanathan, Hassan Habibi Gharakheili, Franco Loi, Adam Radford, Chamith Wijenayake, Arun Vishwanath, and Vijay Sivaraman. 2019. Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics. IEEE Transactions on Mobile Computing 18, 8 (2019), 1745--1759. https://doi.org/10. 1109/TMC.2018.2866249Google ScholarCross Ref
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 1929--1958.Google ScholarDigital Library
- Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient Object Localization Using Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: exploiting the gap between human and machine speech recognition. In 9th USENIX Workshop on Offensive Technologies (WOOT '15).Google Scholar
- Lionel Sujay Vailshery. 2021. Number of digital voice assistants in use worldwide 2019--2024. https://www.statista.com/statistics/973815/worldwide-digital-voice-assistant-in-use/Google Scholar
- Chenggang Wang, Sean Kennedy, Haipeng Li, King Hudson, Gowtham Atluri, Xuetao Wei, Wenhai Sun, and Boyang Wang. 2020. Fingerprinting encrypted voice traffic on smart speakers with deep learning. In Proceedings of the 13th ACM Conference on Security and Privacy in Wireless and Mobile Networks. ACM. https://doi.org/10.1145/3395351.3399357 arXiv:2005.09800Google ScholarDigital Library
- Qiuyu Xiao, Michael K. Reiter, and Yinqian Zhang. 2015. Mitigating Storage Side Channels Using Statistical Privacy Mechanisms. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA) (CCS '15). ACM, New York, NY, USA, 1582--1594. https://doi.org/10.1145/ 2810103.2813645Google ScholarDigital Library
- Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, XiaoFeng Wang, and Carl A Gunter. 2018. Com- mandersong: A systematic approach for practical adversarial voice recognition. In 27th USENIX Security Symposium (USENIX Security '18). 49--64.Google Scholar
- Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, and Feng Qian. 2019. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 1381--1396.Google ScholarCross Ref
Index Terms
- What a SHAME: Smart Assistant Voice Command Fingerprinting Utilizing Deep Learning
Recommendations
IoT Based Smart Assistant for Blind Person and Smart Home Using the Bengali Language
AbstractBengali is the seventh most spoken language, and 300 million people around the world speak in Bengali. The smart home is responsible for the digitization of modern society. This research reflects an Internet of Things (IoT) based smart solution, ...
Accommodating smart meeting rooms with a context-aware smart assistant
ICIC'10: Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computingWhile lots of efforts have been made to build smart meeting rooms, assisting meeting organizations right before and/or after a meeting, receives less attention. Communicating with a multitude of meeting participants is usually a tedious and time-...
Voice Command Fingerprinting with Locality Sensitive Hashes
CPSIOTSEC'20: Proceedings of the 2020 Joint Workshop on CPS&IoT Security and PrivacySmart home speakers are deployed in millions of homes around the world. These speakers enable users to interact with other IoT devices in the household and provide voice assistance such as telling the weather and reminding appointments. Although smart ...
Comments