Abstract
Deep learning-based models have achieved state-of-the-art performance in a wide variety of classification and recognition tasks. Although such models have been demonstrated to suffer from backdoor attacks in multiple domains, little is known whether speaker recognition system is vulnerable to such an attack, especially in the physical world. In this paper, we launch such backdoor attack on speaker recognition system (SRS) in both digital and physical space and conduct more comprehensive experiments on two common tasks of a speaker recognition system. Taking the poison position, intensity, length, frequency characteristics, and poison rate of the backdoor patterns into consideration, we design four backdoor triggers and use them to poison the training dataset. We demonstrate the results of digital and physical attack success rate (ASR) and show that all 4 backdoor patterns can achieve over 89% ASR on digital attacks and at least 70% on physical attacks. We also show that the maliciously trained model is able to provide comparable performance on clean data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Refer to [15] for more detailed description of the model.
References
Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)
Bhattacharya, G., Alam, M.J., Kenny, P.: Deep speaker recognition: modular or monolithic? In: INTERSPEECH, pp. 1143–1147 (2019)
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)
Chung, S.P., Mok, A.K.: Allergy attack against automatic signature generation. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 61–80. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_4
Chung, S.P., Mok, A.K.: Advanced allergy attacks: does a corpus really help? In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 236–255. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74320-0_13
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. arXiv preprint arXiv:1606.01781 (2016)
Dalvi, N., Domingos, P., Sanghai, S., Verma, D.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108 (2004)
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)
Fortuna, J., Sivakumaran, P., Ariyaeeinia, A., Malegaonkar, A.: Open-set speaker identification using adapted Gaussian mixture models. In: Ninth European Conference on Speech Communication and Technology (2005)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Han, J., Moraga, C.: The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Mira, J., Sandoval, F. (eds.) IWANN 1995. LNCS, vol. 930, pp. 195–201. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59497-3_175
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning. Image Recogn. 7 (2015)
Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58 (2011)
Huang, Y.Y., Wang, W.Y.: Deep residual learning for weakly-supervised relation extraction. arXiv preprint arXiv:1707.08866 (2017)
Koffas, S., Xu, J., Conti, M., Picek, S.: Can you hear it? Backdoor attacks via ultrasonic triggers. arXiv preprint arXiv:2107.14569 (2021)
Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647 (2005)
Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: CEAS, vol. 2005 (2005)
McLaren, M., Ferrer, L., Castan, D., Lawson, A.: The speakers in the wild (SITW) speaker recognition database. In: Interspeech, pp. 818–822 (2016)
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)
Multimodal Information Group (2022). https://www.nist.gov/itl/iad/mig/speaker-recognition
Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: VoxCeleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)
Nandwana, M.K., Ferrer, L., McLaren, M., Castan, D., Lawson, A.: Analysis of critical metadata factors for the calibration of speaker recognition systems. In: INTERSPEECH, pp. 4325–4329 (2019)
Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5
Reynolds, D.A.: Gaussian mixture models. Encyclopedia Biometrics 741(659–663) (2009)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., Khudanpur, S.: Speaker recognition for multi-speaker conversations using X-vectors. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5796–5800. IEEE (2019)
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)
Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)
Wittel, G.L., Wu, S.F.: On attacking statistical spam filters. In: CEAS. Citeseer (2004)
Xu, M., Duan, L.-Y., Cai, J., Chia, L.-T., Xu, C., Tian, Q.: HMM-based audio keyword generation. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds.) PCM 2004. LNCS, vol. 3333, pp. 566–574. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30543-9_71
Ye, J., Liu, X., You, Z., Li, G., Liu, B.: DriNet: dynamic backdoor attack against automatic speech recognization models. Appl. Sci. 12(12), 5786 (2022)
Zhai, T., Li, Y., Zhang, Z.M., Wu, B., Jiang, Y., Xia, S.: Backdoor attack against speaker verification. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2560–2564 (2021)
Acknowledgement
We would like to thank the reviewers for their helpful comments. Jianwei Tai is supported by the National Key Research and Development Program of China (No. 2019YFE0110300) and the National Natural Science Foundation of China under Grant 71971075, 72271076, and 71871079. Xiaoqi Jia is supported in part by Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02010900) and National Key Research and Development Program of China (No. 2019YFB1005201 and No. 2021YFB2910109).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, Y., Tai, J., Jia, X., Zhang, S. (2022). Practical Backdoor Attack Against Speaker Recognition System. In: Su, C., Gritzalis, D., Piuri, V. (eds) Information Security Practice and Experience. ISPEC 2022. Lecture Notes in Computer Science, vol 13620. Springer, Cham. https://doi.org/10.1007/978-3-031-21280-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-21280-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21279-6
Online ISBN: 978-3-031-21280-2
eBook Packages: Computer ScienceComputer Science (R0)