Replay spoofing countermeasures using high spectro-temporal resolution features

Alluri, K. N. R. K. Raju; Vuppala, Anil Kumar

doi:10.1007/s10772-019-09602-z

Replay spoofing countermeasures using high spectro-temporal resolution features

Published: 20 February 2019

Volume 22, pages 271–281, (2019)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

K. N. R. K. Raju Alluri¹ &
Anil Kumar Vuppala¹

328 Accesses
1 Citation
Explore all metrics

Abstract

The easy implementation of replay attacks by a fraudster poses a severe threat to automatic speaker verification (ASV) technology than the other spoofing attacks like speech synthesis and voice conversion. Replay attacks refer to an attack by a fraudster to get illegitimate access to an ASV system by playing back the speech sample collected from genuine target speaker. The significant cues that can differentiate between genuine and replay recordings are channel characteristics. To capture these characteristics, one need to extract features from the spectrum, which should have high spectral and temporal resolutions. Zero time windowing (ZTW) analysis of speech is one such time-frequency analysis technique, which results in high spectral and temporal resolution spectrum at each sampling instant. In this study, new features are proposed by applying cepstral analysis to ZTW spectrum. Experiments are performed on two publicly available replay attack databases namely BTAS 2016 and ASVspoof 2017. The first set of experiments are conducted using Gaussian mixture models to evaluate the potential of proposed features. Performance of the proposed system in terms of half total error rate is 0.75% and in terms of equal error rate is 14.75% on BTAS 2016 and ASVspoof 2017 evaluation sets respectively. A score level fusion is performed by using proposed features with previously proposed single frequency filtering cepstral coefficients. This fused result outperformed the previously reported best results on these two datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Article 29 April 2022

Detection of replay signals using excitation source and shifted CQCC features

Article 04 February 2021

A Replay Speech Detection Algorithm Based on Sub-band Analysis

Notes

References

Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017a). Detection of replay attacks using single frequency filtering cepstral coefficients. In Proceedings of the Interspeech 2017 (pp. 2596–2600).
Alluri, K. R., Achanta, S., Kadiri, S. R., Gangashetty, S. V., & Vuppala, A. K. (2017b). Sff anti-spoofer: Iiit-h submission for automatic speaker verification spoofing and countermeasures challenge 2017. In Proceedings of the Interspeech (pp. 107–111).
Aneeja, G., & Yegnanarayana, B. (2015). Single frequency filtering approach for discriminating speech and nonspeech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4), 705–717.
Article Google Scholar
Bayya, Y., & Gowda, D. N. (2013). Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Communication, 55(6), 782–795.
Article Google Scholar
Brümmer, N., & de Villiers, E. (2013). The BOSARIS Toolkit: Theory, algorithms and code for surviving the New DCF. arXiv preprint arXiv:1304.2865.
Cai, W., Cai, D., Liu, W., Li, G., & Li, M. (2017). Countermeasures for automatic speaker verification replay spoofing attack : On data augmentation, feature representation, classification and fusion. In Proceedings of the Interspeech 2017 (pp. 17–21).
Chen, Z., Xie, Z., Zhang, W., & Xu, X. (2017). Resnet and model fusion for automatic spoofing detection. In Proceedings of the Interspeech 2017 (pp. 102–106).
Delgado, H., Todisco, M., Sahidullah, M., Evans, N., Kinnunen, T., Lee, K. A., & Yamagishi, J. (2018). Asvspoof 2017 version 2.0: Meta-data analysis and baseline enhancements. In Proceedings of the Odyssey 2018 the speaker and language recognition workshop (pp. 296–303).
Ergünay, S. K., Khoury, E., Lazaridis, A., & Marcel, S. (2015). On the vulnerability of speaker verification to realistic voice spoofing. In Proceedings of the BTAS (pp. 1–6).
Font, R., Espn, J. M., & Cano, M. J. (2017). Experimental analysis of features for replay attack detection results on the ASVspoof 2017 challenge. In Proceedings of the Interspeech 2017 (pp. 7–11).
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
Article Google Scholar
Hanilçi, C. (2018). Data selection for i-vector based automatic speaker verification anti-spoofing. Digital Signal Processing, 72, 171–180.
Article Google Scholar
Jelil, S., Das, R. K., Prasanna, S. M., & Sinha, R. (2017). Spoof detection using source, instantaneous frequency and cepstral features. In Proceedings of the Interspeech 2017 (pp. 22–26).
Ji, Z., Li, Z.-Y., Li, P., An, M., Gao, S., Wu, D., & Zhao, F. (2017). Ensemble learning for countermeasure of audio replay spoofing attack in ASVspoof 2017. In Proceedings of the Interspeech 2017 (pp. 87–91).
Kinnunen, T., Sahidullah, M., Delgado, H., Todisco, M., Evans, N.,Yamagishi, J., & Lee, K. A. (2017a). The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection. In Proceedings of the 18th annual conference of the international speech communication association (pp. 2–6).
Kinnunen, T., Sahidullah, M., Falcone, M., Costantini, L., Hautamaki, R. G., Thomsen, D. A. L., Sarkar, A. K., Tan, Z.-H., Delgado, H., & Todisco, M., et al. (2017b). RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research. In IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, 2017 (pp. 5395–5399)
Kinnunen, T., Sahidullah, M., Kukanov, I., Delgado, H., Todisco, M., Sarkar, A. K., Thomsen, N. B., Hautamäki, V., Evans, N. W., & Tan, Z.-H. (2016). Utterance verification for text-dependent speaker recognition: A comparative assessment using the reddots corpus. In Proceedings of the Interspeech (pp. 430–434).
Korshunov, P., & Marcel, S. (2016). Cross-database evaluation of audio-based spoofing detection systems. In Proceedings of the Interspeech (pp. 1705–1709).
Korshunov, P., Marcel, S., Muckenhirn, H., Gonçalves, A., Mello, A. S., Violato, R. V., Simoes, F., Neto, M., de Assis Angeloni, M., Stuchi, J., et al. (2016). Overview of BTAS 2016 speaker anti-spoofing competition. In 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–6).
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., & Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. In Proceedings of the Interspeech (pp. 82–86).
Li, L., Chen, Y., Wang, D., & Zheng, T. F. (2017). A study on replay attack and anti-spoofing for automatic speaker verification. In Proceedings of the Interspeech 2017 (pp. 92–96).
Nagarsheth, P., Khoury, E., Patil, K., & Garland, M. (2017). Replay attack detection using DNN for channel discrimination. In Proceedings of the Interspeech 2017 (pp. 97–101).
Pati, D., & Prasanna, S. M. (2013). A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana, 38(4), 591–620.
Article MathSciNet MATH Google Scholar
Patil, H. A., Kamble, M. R., Patel, T. B., & Soni, M. H. (2017). Novel variable length teager energy separation based instantaneous frequency features for replay detection. In Proceedings of the Interspeech 2017 (pp. 12–16).
Paul, D., Sahidullah, M., & Saha, G. (2017). Generalization ofspoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2047–2051).
Sahidullah, M., Kinnunen, T., & Hanilçi, C. (2015). Acomparison of features for synthetic speech detection. In Proceedings of the Interspeech (pp. 2087–2091).
Sahidullah, M., Thomsen, D. A. L., Hautamäki, R. G., Kinnunen, T., Tan, Z.-H., Parts, R., et al. (2018). Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 44–56.
Article Google Scholar
Shang, W., & Stevenson, M. (2010). Score normalization in playbackattack detection. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 1678–1681).
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., & Matsui, T. (2016). Voice liveness detection for speaker verification based on a tandem single/double-channel pop noise detector. In Proceedings of the Odyssey: Speaker language recognition workshop (Vol. 2016, pp. 259–263).
Todisco, M., Delgado, H., & Evans, N. (2016). A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients. In Proceedings of the Speaker Odyssey Workshop, Bilbao, Spain (Vol. 25, pp. 249–252).
Todisco, M., Delgado, H., & Evans, N. (2017). Constant q cepstral coefficients: A spoofing countermeasure for automatic speaker verification. Computer Speech and Language, 45, 516–535.
Article Google Scholar
Villalba, J., & Lleida, E. (2011a). Detecting replay attacks from far-field recordings on speaker verification systems. In Proceedings of the European workshop on biometrics and identity management (pp. 274–285).
Villalba, J., & Lleida, E. (2011b). Preventing replay attacks onspeaker verification systems. In IEEE international carnahan conference on security technology (ICCST) (pp. 1–8).
Wang, X., Xiao, Y., & Zhu, X. (2017). Feature selection based on CQCCS for automatic speaker verification spoofing. In Proceedings of the Interspeech 2017 (pp. 32–36).
Wang, Z.-F., Wei, G., & He, Q.-H. (2011). Channel pattern noise based playback attack detection algorithm for speaker recognition. In International conference on machine learning and cybernetics, Guilin, 2011 (pp. 1708–1713).
Witkowski, M., Kacprzak, S, Elasko, P., Kowalczyk, K., & Gaka, J. (2017). Audio replay attack detection using high-frequency features. In Proceedings of the Interspeech 2017 (pp. 27–31).
Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153.
Article Google Scholar

Download references

Acknowledgements

Authors thank Mr. Sudarsana Reddy Kadiri, and Mr. Sivanand Achanta of IIIT-Hyderabad for assistance with single frequency filtering and zero time windowing techniques. The first author would like to thank the Department of Electronics and Information Technology, Ministry of Communication & IT, Govt of India for granting Ph.D. Fellowship under Visvesvaraya Ph.D. Scheme.

Author information

Authors and Affiliations

Speech Processing Laboratory, KCIS, International Institute of Information Technology, Hyderabad, India
K. N. R. K. Raju Alluri & Anil Kumar Vuppala

Authors

K. N. R. K. Raju Alluri
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Vuppala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. N. R. K. Raju Alluri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alluri, K.N.R.K.R., Vuppala, A.K. Replay spoofing countermeasures using high spectro-temporal resolution features. Int J Speech Technol 22, 271–281 (2019). https://doi.org/10.1007/s10772-019-09602-z

Download citation

Received: 06 November 2018
Accepted: 12 February 2019
Published: 20 February 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10772-019-09602-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Replay spoofing countermeasures using high spectro-temporal resolution features

Abstract

Access this article

Similar content being viewed by others

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Detection of replay signals using excitation source and shifted CQCC features

A Replay Speech Detection Algorithm Based on Sub-band Analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Replay spoofing countermeasures using high spectro-temporal resolution features

Abstract

Access this article

Similar content being viewed by others

Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features

Detection of replay signals using excitation source and shifted CQCC features

A Replay Speech Detection Algorithm Based on Sub-band Analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation