Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network

Chakravarty, Nidhi; Dua, Mohit

doi:10.1007/s42979-022-01557-4

Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network

Original Research
Published: 26 December 2022

Volume 4, article number 127, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Nidhi Chakravarty¹ &
Mohit Dua¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recent advancements in research for development of countermeasure systems for Spoofed Audio detection has helped in building more robust Automatic Speaker Verification (ASV) System. However, available countermeasure systems are not able to generalize well against unknown attacks. The lack of context-dependent information extracted from the given speech at fine grained level is the dominating reason for poor performance of these systems against unknown attacks. To build a noise robust anti-spoof system, in this paper, we propose a Time Delay Neural Network (TDNN)-based countermeasure system that captures context-dependent information well. We devise a three-stage design where at first audio is pre-processed to extract useful information using three different types of features, that are, Mel Frequency Cepstral Coefficients (MFCC), noise robust Gammatone Cepstral Coefficients (GTCC) features and integration of MFCC-GTCC features. These features are then input to proposed Deep Neural Network (DNN) model that uses Long Short-Term Memory (LSTM) network for recurrent aggregation of layer wise generated shallow features in TDNN. Finally, the output is passed through context-dependent pooling layer to generate fixed-length representation that is further used at third stage to classify speech as genuine or spoofed. The proposed system is tested on Logical Access (LA) track of ASV Spoof 2019 dataset, and achieves performance improvement of about 59.7% and 65.9% relative to earlier proposed Linear-Frequency Cepstral Coefficients-Gaussian Mixture Model (LFCC-GMM) and Constant Q Cepstral Coefficients-Gaussian Mixture Model (CQCC-GMM) baseline models, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Data availability

All the data generated or analyzed during this study are included and referred to in this published article.

References

Tak H, Todisco M, Wang X, Jung J, Yamagishi J, Evans N. Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation. 2022. arXiv Prepr. arXiv2202.12233
Wu Z, Evans N, Kinnunen T, Yamagishi J, Alegre F, Li H. Spoofing and countermeasures for speaker verification: a survey. Speech Commun. 2015;66:130–53.
Article Google Scholar
Wu Z, et al. ASVspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE J Sel Top Signal Process. 2017;11(4):588–604. https://doi.org/10.1109/JSTSP.2017.2671435.
Article MathSciNet Google Scholar
Yamagishi J et al. Asvspoof 2019: the 3rd automatic speaker verification spoofing and countermeasures challenge database. 2019.
Wu Z, Gao S, Cling ES, Li H. A study on replay attack and anti-spoofing for text-dependent speaker verification. Signal Inf Process Assoc Annu Summit Conf (APSIPA) Asia-Pac. 2014. https://doi.org/10.1109/APSIPA.2014.7041636.
Article Google Scholar
Hossan MA, Memon S, Gregory MA. A novel approach for MFCC feature extraction. Int Conf Signal Process Commun Syst. 2010. https://doi.org/10.1109/ICSPCS.2010.5709752.
Article Google Scholar
Dave N. Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol. 2013;1:2320–6802.
Google Scholar
Todisco M, Delgado H, Evans N. Constant Q Cepstral coefficients: a spoofing countermeasure for automatic speaker verification. Comput Speech Lang. 2017. https://doi.org/10.1016/j.csl.2017.01.001.
Article Google Scholar
Todisco M, Delgado H, Evans NWD. A new feature for automatic speaker verification anti-spoofing: constant Q cepstral coefficients. Odyssey. 2016;2016:283–90.
Google Scholar
Valero X, Alías F. Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. Multimed IEEE Trans. 2012;14:1684–9. https://doi.org/10.1109/TMM.2012.2199972.
Article Google Scholar
Ge W, Tak H, Todisco M, Evans N. On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification. 2022. arXiv Prepr. arXiv2209.00506
Liu H, Zhao L. A speaker verification method based on TDNN–LSTMP. Circuits Syst Signal Process. 2019;38(10):4840–54.
Article Google Scholar
Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S. X-vectors: robust dnn embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5329–33.
Snyder D, Garcia-Romero D, Sell G, McCree A, Povey D, Khudanpur S. Speaker recognition for multi-speaker conversations using x-vectors. In: ICASSP 2019–2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), 2019, pp. 5796–800.
Qin Y, Du J, Wang X, Lu H. Recurrent layer aggregation using LSTM. In: 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8.
Kumar MG, Kumar SR, Saranya MS, Bharathi B, Murthy HA. Spoof detection using time-delay shallow neural network and feature switching. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 1011–17.
Zhang X, Zhang X, Zou X, Liu H, Sun M. Towards generating adversarial examples on combined systems of automatic speaker verification and spoofing countermeasure. Secur Commun Netw. 2022;2022:2666534. https://doi.org/10.1155/2022/2666534.
Article Google Scholar
Ray R, et al. Feature genuinization based residual squeeze-and-excitation for audio anti-spoofing in sound AI. Int Conf Comput Commun Netw Technol (ICCCNT). 2021. https://doi.org/10.1109/ICCCNT51525.2021.9580127.
Article Google Scholar
Wang Z, Cui S, Kang X, Sun W, Li Z. Densely connected convolutional network for audio spoofing detection. In: 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 1352–60.
Mittal A, Dua M. Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network. Int J Swarm Intell. 2021;6(2):143–53.
Article Google Scholar
Mittal A, Dua M. Constant Q cepstral coefficients and long short-term memory model-based automatic speaker verification system. In: Proceedings of International Conference on Intelligent Computing, Information and Control Systems, 2021, pp. 895–904.
Lv Z, Zhang S, Tang K, Hu P. Fake audio detection based on unsupervised pretraining models. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 9231–5.
. Hassan F, Javed A. Voice spoofing countermeasure for synthetic speech detection. In: 2021 International Conference on Artificial Intelligence (ICAI), 2021, pp. 209–12.
Rupesh Kumar S, Bharathi B. Generative and discriminative modelling of linear energy sub-bands for spoof detection in speaker verification systems. Circuits Syst Signal Process. 2022;41(7):3811–31.
Article Google Scholar
Chen T, Kumar A, Nagarsheth P, Sivaraman G, Khoury E. Generalization of audio deepfake detection. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop, 2020, pp. 132–7.
Barai B, Basu S, Nasipuri M, Das D, Das N. VQ/GMM based speaker identification with emphasis on language dependency. 2018.
Fu Z, Lu G, Ting KM, Zhang D. A survey of audio-based music classification and annotation. IEEE Trans Multimed. 2010;13(2):303–19.
Article Google Scholar
Cheng O, Abdulla W, Salcic Z. Performance evaluation of front-end algorithms for robust speech recognition. Proc Eighth Int Symp Signal Process Appl. 2005;2:711–4. https://doi.org/10.1109/ISSPA.2005.1581037.
Article Google Scholar
Li et al. X. Replay and synthetic speech detection with res2net architecture. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6354–8.
Wang X, et al. ASVspoof 2019: a large-scale public database of synthetized, converted and replayed speech. Comput Speech Lang. 2020;64:101114. https://doi.org/10.1016/j.csl.2020.101114.
Article Google Scholar
Desplanques B, Thienpondt J, Demuynck K. Ecapa-tdnn: emphasized channel attention, propagation and aggregation in tdnn based speaker verification. 2020. arXiv Prepr. arXiv2005.07143
Dua M, Sadhu A, Jindal A, Mehta R. A hybrid noise robust model for multireplay attack detection in automatic speaker verification systems. Biomed Signal Process Control. 2022;74:103517. https://doi.org/10.1016/j.bspc.2022.103517.
Article Google Scholar

Download references

Funding

I, Dr. Mohit Dua, on the behalf of all the authors declare that this study did not receive any funding from any resource.

Author information

Authors and Affiliations

Department of Computer Engineering, National Institute of Technology, Kurukshetra, India
Nidhi Chakravarty & Mohit Dua

Authors

Nidhi Chakravarty
View author publications
You can also search for this author in PubMed Google Scholar
Mohit Dua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nidhi Chakravarty.

Ethics declarations

Conflict of Interest

The authors declare that submitted manuscript have no conflict of interest.

Ethical Approval

This research article does not contain any studies with human participants or animals performed by any of the authors.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano and Nishtha Kesswani.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chakravarty, N., Dua, M. Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network. SN COMPUT. SCI. 4, 127 (2023). https://doi.org/10.1007/s42979-022-01557-4

Download citation

Received: 11 September 2022
Accepted: 11 December 2022
Published: 26 December 2022
DOI: https://doi.org/10.1007/s42979-022-01557-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Human and Animal Rights

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Noise Robust ASV Spoof Detection Using Integrated Features and Time Delay Neural Network

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Human and Animal Rights

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation