ABSTRACT
As a biometric technology, speaker recognition is widely used in finance, criminal investigation, and other fields due to its convenience and high accuracy. Speaker recognition models are vulnerable to spoofing attacks and adversarial attacks. Thus, the security of speaker recognition models has received much attention. However, few works focus on the decision-based adversarial attacks for speaker recognition systems (SRS), in which the adversary can only access the final decisions of the black-box models. In this paper, we proposed Biased-Aha, a decision-based attack method that combined query history information and prior gradient from the substitution model to launch an efficient attack. Specifically, to generate the adversarial example, the perturbation is determined by following the sampling direction for successful queries and avoiding the sampling direction for failed queries, combined with the gradient direction from the substitution model. The experiment results show that Biased-Aha takes a high attack success rate and high efficiency. For the speaker recognition models, Gaussian Mixture Models (GMM) and ivector, Biased-Aha outperforms the state-of-the-art decision-based adversarial attacks.
- Reynolds D A. An overview of automatic speaker recognition technology[C]//2002 IEEE international conference on acoustics, speech, and signal processing. IEEE, 2002, 4: IV-4072-IV-4075.Google Scholar
- Bai Z, Zhang X L. Speaker recognition based on deep learning: An overview[J]. Neural Networks, 2021, 140: 65-99.Google ScholarCross Ref
- Singh S. Forensic and Automatic Speaker Recognition System[J]. International Journal of Electrical & Computer Engineering (2088-8708), 2018, 8(5).Google ScholarCross Ref
- Chowdhury A, Atoum Y, Tran L, Msu-avis dataset: Fusing face and voice modalities for biometric recognition in indoor surveillance videos[C]//2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018: 3567-3573.Google Scholar
- Ren H, Song Y, Yang S, Secure smart home: A voiceprint and internet based authentication system for remote accessing[C]//2016 11th International Conference on Computer Science & Education (ICCSE). IEEE, 2016: 247-251.Google Scholar
- Wang Q, Guo P, Xie L. Inaudible adversarial perturbations for targeted attack in speaker recognition[J]. arXiv preprint arXiv:2005.10637, 2020.Google Scholar
- Kreuk F, Adi Y, Cisse M, Fooling end-to-end speaker verification with adversarial examples[C]//2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018: 1962-1966.Google Scholar
- Gong Y, Poellabauer C. Crafting adversarial examples for speech paralinguistics applications[J]. arXiv preprint arXiv:1711.03280, 2017.Google Scholar
- Chen G, Chenb S, Fan L, Who is real bob? adversarial attacks on speaker recognition systems[C]//2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021: 694-711.Google Scholar
- Du T, Ji S, Li J, Sirenattack: Generating adversarial audio for end-to-end acoustic systems[C]//Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. 2020: 357-369.Google Scholar
- Zheng B, Jiang P, Wang Q, Black-box adversarial attacks on commercial speech platforms with minimal information[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021: 86-107.Google Scholar
- Seo J, Yoon T, Kim J, One-to-one Example-based Automatic Image Coloring Using Deep Convolutional Generative Adversarial Network[J]. Journal of Advances in Information Technology Vol, 2017, 8(2).Google Scholar
- Kumar A, Irsoy O, Ondruska P, Ask me anything: Dynamic memory networks for natural language processing[C]//International conference on machine learning. PMLR, 2016: 1378-1387.Google Scholar
- Zhang Y, Jiang Z, Villalba J, Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples[C]//INTERSPEECH. 2020: 4238-4242.Google Scholar
- Li Z, Wu Y, Liu J, Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 2020: 1121-1134.Google Scholar
- Brendel W, Rauber J, Bethge M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models[J]. arXiv preprint arXiv:1712.04248, 2017.Google Scholar
- Dong Y, Su H, Wu B, Efficient decision-based black-box adversarial attacks on face recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7714-7722.Google Scholar
- Brunner T, Diehl F, Le M T, Guessing smart: Biased sampling for efficient black-box adversarial attacks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 4958-4966.Google Scholar
- Shi Y, Han Y, Tian Q. Polishing decision-based adversarial noise with a customized sampling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1030-1038.Google Scholar
- Ilyas A, Engstrom L, Madry A. Prior convictions: Black-box adversarial attacks with bandits and priors[J]. arXiv preprint arXiv:1807.07978, 2018.Google Scholar
- Li J, Ji R, Liu H, Projection & probability-driven black-box attack[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 362-371.Google Scholar
- Li J, Ji R, Chen P, Aha! adaptive history-driven attack for decision-based black-box models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 16168-16177.Google Scholar
- Chen G, Zhao Z, Song F, SEC4SR: a security analysis platform for speaker recognition[J]. arXiv preprint arXiv:2109.01766, 2021.Google Scholar
- Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital signal processing, 2000, 10(1-3): 19-41.Google Scholar
- Dehak N, Kenny P J, Dehak R, Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 19(4): 788-798.Google ScholarDigital Library
- Ravanelli M, Bengio Y. Speaker recognition from raw waveform with sincnet[C]//2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018: 1021-1028.Google Scholar
- Snyder D, Garcia-Romero D, Sell G, Speaker recognition for multi-speaker conversations using x-vectors[C]//ICASSP 2019-2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 2019: 5796-5800.Google Scholar
- Chen J, Jordan M I, Wainwright M J. Hopskipjumpattack: A query-efficient decision-based attack[C]//2020 ieee symposium on security and privacy (sp). IEEE, 2020: 1277-1294.Google Scholar
Index Terms
- Decision-based adversarial attack for speaker recognition models
Recommendations
Practical Adversarial Attacks Against Speaker Recognition Systems
HotMobile '20: Proceedings of the 21st International Workshop on Mobile Computing Systems and ApplicationsUnlike other biometric-based user identification methods (e.g., fingerprint and iris), speaker recognition systems can identify individuals relying on their unique voice biometrics without requiring users to be physically present. Therefore, speaker ...
Practical Backdoor Attack Against Speaker Recognition System
Information Security Practice and ExperienceAbstractDeep learning-based models have achieved state-of-the-art performance in a wide variety of classification and recognition tasks. Although such models have been demonstrated to suffer from backdoor attacks in multiple domains, little is known ...
Enhancing Transferability of Adversarial Audio in Speaker Recognition Systems
Pattern Recognition and Image AnalysisAbstractAlthough deep neural networks have demonstrated state-of-the-art performance in several tasks such as speaker recognition among others, they are highly vulnerable to adversarial attacks. These attacks involve the transformation of the original ...
Comments