Abstract
Automatic Speaker Verification (ASV) is widely used for its convenience, but is vulnerable to spoofing attack. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline in ASVspoof challenge. The GMM accumulates the scores on all frames in a speech independently, and does not consider its context. We propose the self-attention network spoofing detection model whose input is the log-probabilities of the speech frames on the GMM components. The model relies on the self-attention mechanism which directly draws the global dependencies of the inputs. The model considers not only the score distribution on GMM components, but also the relationship of frames. And the pooling layer is used to capture long-term characteristics for detection. We also proposed the two-path attention network, which is based on two GMMs trained on genuine and spoofed speech respectively. Experiments on the ASVspoof 2019 challenge logical and physical access scenarios show that the proposed models can improve performance greatly compared with the baseline systems. LFCC feature is more suitable for our models than CQCC in experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Kinnunen, T., Wu, Z.Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H.: Vulnerability of speaker verification systems against voice conversion spoofing attacks: The Case of telephone speech. In: IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 4401–4404 (2012)
Lindberg, J., Blomberg, M.: Vulnerability in speaker verification–a study of technical impostor techniques. In: European Conference on Speech Communication and Technology (1999)
Hautamäki, R.S., et al.: Automatic versus human speaker verification: the case of voice mimicry. Speech Commun. 72, 13–31 (2015)
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)
Galka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)
Sahidullah, M., Kinnunen, T., Hanilci, C.: A comparison of features for synthetic speech detection. In: Proceedings of the INTERSPEECH, pp. 2087–2091 (2015)
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Davis, S.B., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Alegre, F., Amehraye, A., Evans, N.: A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS). pp. 1–8 (2013)
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V.: Audio replay attack detection with deep learning frameworks. In: INTERSPEECH, pp. 82–86 (2017)
Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., Gomez, A.M.: A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection. In: INTERSPEECH, pp. 1068–1072 (2019)
Alzantot, M., Wang, Z., Srivastava, M.B.: Deep residual neural networks for audio spoofing detection. In: INTERSPEECH, pp. 1078–1082 (2019)
Lai, C-I., Abad, A., Richmond, K., Yamagishi, J., Dehak, N., King, S.: Attentive filtering networks for audio replay attack detection. In: ICASSP, pp. 6316–6320 (2019)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Tom, F., Jain, M., Dey, P.: End-to-end audio replay attack detection using deep convolutional networks with attention. In: INTERSPEECH, pp. 681–685 (2018)
Lai, C., Chen, N., Villalba, J., Dehak, N.: ASSERT: anti-spoofing with squeeze-excitation and residual networks. In: INTERSPEECH (2019)
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. In: INTERSPEECH (2019)
Sadjadi, S.O., et al.: MSR Identity Toolbox v1.0: A MATLAB toolbox for speaker recognition research. Speech and Lang. Process. Tech. Committee Newsl. (2013)
Acknowledgments
This work is supported by National Natural Science Foundation of P.R. China (62067004), and by Educational Commission of Jiangxi Province of P.R. China (GJJ170205).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lei, Z., Yu, H., Yang, Y., Ma, M. (2021). Attention Network with GMM Based Feature for ASV Spoofing Detection. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-86608-2_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86607-5
Online ISBN: 978-3-030-86608-2
eBook Packages: Computer ScienceComputer Science (R0)