Skip to main content

Advertisement

Log in

Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recent advancements in research for development of countermeasure systems for Spoofed Audio detection has helped in building more robust Automatic Speaker Verification (ASV) System. However, available countermeasure systems are not able to generalize well against unknown attacks. The lack of context-dependent information extracted from the given speech at fine grained level is the dominating reason for poor performance of these systems against unknown attacks. To build a noise robust anti-spoof system, in this paper, we propose a Time Delay Neural Network (TDNN)-based countermeasure system that captures context-dependent information well. We devise a three-stage design where at first audio is pre-processed to extract useful information using three different types of features, that are, Mel Frequency Cepstral Coefficients (MFCC), noise robust Gammatone Cepstral Coefficients (GTCC) features and integration of MFCC-GTCC features. These features are then input to proposed Deep Neural Network (DNN) model that uses Long Short-Term Memory (LSTM) network for recurrent aggregation of layer wise generated shallow features in TDNN. Finally, the output is passed through context-dependent pooling layer to generate fixed-length representation that is further used at third stage to classify speech as genuine or spoofed. The proposed system is tested on Logical Access (LA) track of ASV Spoof 2019 dataset, and achieves performance improvement of about 59.7% and 65.9% relative to earlier proposed Linear-Frequency Cepstral Coefficients-Gaussian Mixture Model (LFCC-GMM) and Constant Q Cepstral Coefficients-Gaussian Mixture Model (CQCC-GMM) baseline models, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

All the data generated or analyzed during this study are included and referred to in this published article.

References

  1. Tak H, Todisco M, Wang X, Jung J, Yamagishi J, Evans N. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. 2022. arXiv Prepr. arXiv2202.12233

  2. Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H. Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 2015;66:130–53.

    Article  Google Scholar 

  3. Wu Z, et al. ASVspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE J Sel Top Signal Process. 2017;11(4):588–604. https://doi.org/10.1109/JSTSP.2017.2671435.

    Article  MathSciNet  Google Scholar 

  4. Yamagishi J et al. Asvspoof 2019: the 3rd automatic speaker verification spoofing and countermeasures challenge database. 2019.

  5. Wu Z, Gao S, Cling ES, Li H. A study on replay attack and anti-spoofing for text-dependent speaker verification. Signal Inf Process Assoc Annu Summit Conf (APSIPA) Asia-Pac. 2014. https://doi.org/10.1109/APSIPA.2014.7041636.

    Article  Google Scholar 

  6. Hossan MA, Memon S, Gregory MA. A novel approach for MFCC feature extraction. Int Conf Signal Process Commun Syst. 2010. https://doi.org/10.1109/ICSPCS.2010.5709752.

    Article  Google Scholar 

  7. Dave N. Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol. 2013;1:2320–6802.

    Google Scholar 

  8. Todisco M, Delgado H, Evans N. Constant Q Cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang. 2017. https://doi.org/10.1016/j.csl.2017.01.001.

    Article  Google Scholar 

  9. Todisco M, Delgado H, Evans NWD. A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. Odyssey. 2016;2016:283–90.

    Google Scholar 

  10. Valero X, Alías F. Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. Multimed IEEE Trans. 2012;14:1684–9. https://doi.org/10.1109/TMM.2012.2199972.

    Article  Google Scholar 

  11. Ge W, Tak H, Todisco M, Evans N. On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification. 2022. arXiv Prepr. arXiv2209.00506

  12. Liu H, Zhao L. A speaker verification method based on TDNN–LSTMP. Circuits Syst Signal Process. 2019;38(10):4840–54.

    Article  Google Scholar 

  13. Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S. X-vectors: robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5329–33.

  14. Snyder D, Garcia-Romero D, Sell G, McCree A, Povey D, Khudanpur S. Speaker recognition for multi-speaker conversations using x-vectors. In: ICASSP 2019–2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), 2019, pp. 5796–800.

  15. Qin Y, Du J, Wang X, Lu H. Recurrent layer aggregation using LSTM. In: 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.

  16. Kumar MG, Kumar SR, Saranya MS, Bharathi B, Murthy HA. Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 1011–17.

  17. Zhang X, Zhang X, Zou X, Liu H, Sun M. Towards generating adversarial examples on combined systems of automatic speaker verification and spoofing countermeasure. Secur Commun Netw. 2022;2022:2666534. https://doi.org/10.1155/2022/2666534.

    Article  Google Scholar 

  18. Ray R, et al. Feature genuinization based residual squeeze-and-excitation for audio anti-spoofing in sound AI. Int Conf Comput Commun Netw Technol (ICCCNT). 2021. https://doi.org/10.1109/ICCCNT51525.2021.9580127.

    Article  Google Scholar 

  19. Wang Z, Cui S, Kang X, Sun W, Li Z. Densely connected convolutional network for audio spoofing detection. In: 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 1352–60.

  20. Mittal A, Dua M. Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network. Int J Swarm Intell. 2021;6(2):143–53.

    Article  Google Scholar 

  21. Mittal A, Dua M. Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In: Proceedings of International Conference on Intelligent Computing, Information and Control Systems, 2021, pp. 895–904.

  22. Lv Z, Zhang S, Tang K, Hu P. Fake audio detection based on unsupervised pretraining models. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9231–5.

  23. . Hassan F, Javed A. Voice spoofing countermeasure for synthetic speech detection. In: 2021 International Conference on Artificial Intelligence (ICAI), 2021, pp. 209–12.

  24. Rupesh Kumar S, Bharathi B. Generative and discriminative modelling of linear energy sub-bands for spoof detection in speaker verification systems. Circuits Syst Signal Process. 2022;41(7):3811–31.

    Article  Google Scholar 

  25. Chen T, Kumar A, Nagarsheth P, Sivaraman G, Khoury E. Generalization of audio deepfake detection. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp. 132–7.

  26. Barai B, Basu S, Nasipuri M, Das D, Das N. VQ/GMM based speaker identification with emphasis on language dependency. 2018.

  27. Fu Z, Lu G, Ting KM, Zhang D. A survey of audio-based music classification and annotation. IEEE Trans Multimed. 2010;13(2):303–19.

    Article  Google Scholar 

  28. Cheng O, Abdulla W, Salcic Z. Performance evaluation of front-end algorithms for robust speech recognition. Proc Eighth Int Symp Signal Process Appl. 2005;2:711–4. https://doi.org/10.1109/ISSPA.2005.1581037.

    Article  Google Scholar 

  29. Li et al. X. Replay and synthetic speech detection with res2net architecture. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6354–8.

  30. Wang X, et al. ASVspoof 2019: a large-scale public database of synthetized, converted and replayed speech. Comput Speech Lang. 2020;64:101114. https://doi.org/10.1016/j.csl.2020.101114.

    Article  Google Scholar 

  31. Desplanques B, Thienpondt J, Demuynck K. Ecapa-tdnn: emphasized channel attention, propagation and aggregation in tdnn based speaker verification. 2020. arXiv Prepr. arXiv2005.07143

  32. Dua M, Sadhu A, Jindal A, Mehta R. A hybrid noise robust model for multireplay attack detection in automatic speaker verification systems. Biomed Signal Process Control. 2022;74:103517. https://doi.org/10.1016/j.bspc.2022.103517.

    Article  Google Scholar 

Download references

Funding

I, Dr. Mohit Dua, on the behalf of all the authors declare that this study did not receive any funding from any resource.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nidhi Chakravarty.

Ethics declarations

Conflict of Interest

The authors declare that submitted manuscript have no conflict of interest.

Ethical Approval

This research article does not contain any studies with human participants or animals performed by any of the authors.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano and Nishtha Kesswani.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakravarty, N., Dua, M. Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network. SN COMPUT. SCI. 4, 127 (2023). https://doi.org/10.1007/s42979-022-01557-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01557-4

Keywords

Navigation