Abstract
With the increasing prevalence of voice assistant services (VAS), ensuring system security and user privacy has become a significant challenge. Preliminary analysis of existing authentication mechanisms reveals shortcomings, particularly in multi-user settings and the reliance on additional devices. To address this, we propose a novel approach that embeds users’ biometric templates into the neural network model of voice assistants for identity authentication. Leveraging the robust sound processing capabilities of CNNs, this method employs watermark technology within the model for user identity verification. Experimental results demonstrate that this method effectively verifies user identities while the impact on the original model’s performance can be negligible. Evaluation continuation with 10 participants and 300 different voice commands revealed an overall accuracy of 99.01% and an equal error rate (EER) of 1.25%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anand, S.A., Liu, J., Wang, C., Shirvanian, M., Saxena, N., Chen, Y.: EchoVib: exploring voice authentication via unique non-linear vibrations of short replayed speech. In: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pp. 67–81 (2021)
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Chang, Y.T., Dupuis, M.J.: My voiceprint is my authenticator: a two-layer authentication approach using voiceprint for voice assistants. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1318–1325. IEEE (2019)
Chen, H., Rohani, B.D., Koushanfar, F.: DeepMarks: A digital fingerprinting framework for deep neural networks (2018). arXiv preprint arXiv:1804.03648
Darvish Rouhani, B., Chen, H., Koushanfar, F.: DeepSigns: an end-to-end watermarking framework for ownership protection of deep neural networks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 485–497 (2019)
De Leon, P.L., Pucher, M., Yamagishi, J., Hernaez, I., Saratxaga, I.: Evaluation of speaker verification security and detection of hmm-based synthetic speech. IEEE Trans. Audio Speech Lang. Process. 20(8), 2280–2290 (2012)
El-Moneim, S.A., et al.: Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. Int. J. Speech Technol. 24(4), 993–1006 (2021)
Fan, L., Ng, K.W., Chan, C.S.: Rethinking deep neural network ownership verification: embedding passports to defeat ambiguity attacks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Feng, H., Fawaz, K., Shin, K.G.: Continuous authentication for voice assistants. In: Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, pp. 343–355 (2017)
Lao, Y., Zhao, W., Yang, P., Li, P.: DeepAuth: A DNN authentication framework by model-unique and fragile signature embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 9595–9603 (2022)
Lindberg, J., Blomberg, M.: Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European Conference on Speech Communication and Technology (1999)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Terzopoulos, G., Satratzemi, M.: Voice assistants and smart speakers in everyday life and in education. Inf. Educ. 19(3), 473–490 (2020)
Togneri, R., Pullella, D.: An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag. 11(2), 23–61 (2011)
Uchida, Y., Nagai, Y., Sakazawa, S., Satoh, S.: Embedding watermarks into deep neural networks. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 269–277 (2017)
Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4052–4056. IEEE (2014)
Wang, C., Shi, C., Chen, Y., Wang, Y., Saxena, N.: WearID: Wearable-assisted low-effort authentication to voice assistants using cross-domain speech similarity (2020). arXiv preprint arXiv:2003.09083
Yan, C., Ji, X., Wang, K., Jiang, Q., Jin, Z., Xu, W.: A survey on voice assistant security: attacks and countermeasures. ACM Comput. Surv. 55(4), 1–36 (2022)
Zhao, X., Yao, Y., Wu, H., Zhang, X.: Structural watermarking to deep neural networks via network channel pruning. In: 2021 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cao, P., Wang, Y., Si, Z., Lyu, P., Zhang, H. (2025). Cyber Sentinel: Fortifying Voice Assistant Security with Biometric Template Integration in Neural Networks. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-71464-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71463-4
Online ISBN: 978-3-031-71464-1
eBook Packages: Computer ScienceComputer Science (R0)