Attention Network with GMM Based Feature for ASV Spoofing Detection

Lei, Zhenchun; Yu, Hui; Yang, Yingen; Ma, Minglei

doi:10.1007/978-3-030-86608-2_50

Attention Network with GMM Based Feature for ASV Spoofing Detection

Zhenchun Lei¹²,
Hui Yu¹²,
Yingen Yang¹² &
…
Minglei Ma¹²

Conference paper
First Online: 08 September 2021

1416 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12878))

Abstract

Automatic Speaker Verification (ASV) is widely used for its convenience, but is vulnerable to spoofing attack. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline in ASVspoof challenge. The GMM accumulates the scores on all frames in a speech independently, and does not consider its context. We propose the self-attention network spoofing detection model whose input is the log-probabilities of the speech frames on the GMM components. The model relies on the self-attention mechanism which directly draws the global dependencies of the inputs. The model considers not only the score distribution on GMM components, but also the relationship of frames. And the pooling layer is used to capture long-term characteristics for detection. We also proposed the two-path attention network, which is based on two GMMs trained on genuine and spoofed speech respectively. Experiments on the ASVspoof 2019 challenge logical and physical access scenarios show that the proposed models can improve performance greatly compared with the baseline systems. LFCC feature is more suitable for our models than CQCC in experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., Li, H.: Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 66, 130–153 (2015)
Article Google Scholar
Kinnunen, T., Wu, Z.Z., Lee, K.A., Sedlak, F., Chng, E.S., Li, H.: Vulnerability of speaker verification systems against voice conversion spoofing attacks: The Case of telephone speech. In: IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 4401–4404 (2012)
Google Scholar
Lindberg, J., Blomberg, M.: Vulnerability in speaker verification–a study of technical impostor techniques. In: European Conference on Speech Communication and Technology (1999)
Google Scholar
Hautamäki, R.S., et al.: Automatic versus human speaker verification: the case of voice mimicry. Speech Commun. 72, 13–31 (2015)
Article Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Galka, J., Grzywacz, M., Samborski, R.: Playback attack detection for text-dependent speaker verification over telephone channels. Speech Commun. 67, 143–153 (2015)
Article Google Scholar
Sahidullah, M., Kinnunen, T., Hanilci, C.: A comparison of features for synthetic speech detection. In: Proceedings of the INTERSPEECH, pp. 2087–2091 (2015)
Google Scholar
Todisco, M., Delgado, H., Evans, N.: Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput. Speech Lang. 45, 516–535 (2017)
Article Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Alegre, F., Amehraye, A., Evans, N.: A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns. In: IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS). pp. 1–8 (2013)
Google Scholar
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V.: Audio replay attack detection with deep learning frameworks. In: INTERSPEECH, pp. 82–86 (2017)
Google Scholar
Gomez-Alanis, A., Peinado, A.M., Gonzalez, J.A., Gomez, A.M.: A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection. In: INTERSPEECH, pp. 1068–1072 (2019)
Google Scholar
Alzantot, M., Wang, Z., Srivastava, M.B.: Deep residual neural networks for audio spoofing detection. In: INTERSPEECH, pp. 1078–1082 (2019)
Google Scholar
Lai, C-I., Abad, A., Richmond, K., Yamagishi, J., Dehak, N., King, S.: Attentive filtering networks for audio replay attack detection. In: ICASSP, pp. 6316–6320 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Tom, F., Jain, M., Dey, P.: End-to-end audio replay attack detection using deep convolutional networks with attention. In: INTERSPEECH, pp. 681–685 (2018)
Google Scholar
Lai, C., Chen, N., Villalba, J., Dehak, N.: ASSERT: anti-spoofing with squeeze-excitation and residual networks. In: INTERSPEECH (2019)
Google Scholar
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. In: INTERSPEECH (2019)
Google Scholar
Sadjadi, S.O., et al.: MSR Identity Toolbox v1.0: A MATLAB toolbox for speaker recognition research. Speech and Lang. Process. Tech. Committee Newsl. (2013)
Google Scholar

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of P.R. China (62067004), and by Educational Commission of Jiangxi Province of P.R. China (GJJ170205).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
Zhenchun Lei, Hui Yu, Yingen Yang & Minglei Ma

Authors

Zhenchun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Hui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yingen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Minglei Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenchun Lei .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jianjiang Feng
Fudan University, Shanghai, China
Junping Zhang
Shanghai Jiao Tong University, Shanghai, China
Manhua Liu
Shanghai University, Shanghai, China
Yuchun Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, Z., Yu, H., Yang, Y., Ma, M. (2021). Attention Network with GMM Based Feature for ASV Spoofing Detection. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-86608-2_50
Published: 08 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86607-5
Online ISBN: 978-3-030-86608-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics