Noise robust automatic speaker verification systems: review and analysis

Joshi, Sanil; Dua, Mohit

doi:10.1007/s11235-024-01212-8

Noise robust automatic speaker verification systems: review and analysis

Review
Published: 06 September 2024

Volume 87, pages 845–886, (2024)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Sanil Joshi¹ &
Mohit Dua¹

336 Accesses
1 Citation
Explore all metrics

Abstract

Like any other biometric systems, Automatic Speaker Verification (ASV) systems are also vulnerable to the spoofing attacks. Hence, it is important to develop the countermeasures in order to handle these attacks. In spoofing mainly two types of attacks are considered, logical access attacks and presentation attacks. In the last few decades, several systems have been proposed by various researchers for handling these kinds of attacks. However, noise handling capability of ASV systems is of major concern, as the presence of noise may make an ASV system to falsely evaluate the original human voice as the spoofed audio. Hence, the main objective of this paper is to review and analyze the various noise robust ASV systems proposed by different researchers in recent years. The paper discusses the various front end and back-end approaches that have been used to develop these systems with putting emphasis on the noise handling techniques. Various kinds of noises such as babble, white, background noises, pop noise, channel noises etc. affect the development of an ASV system. This survey starts with discussion about the various components of ASV system. Then, the paper classifies and discusses various enhanced front end feature extraction techniques like phase based, deep learning based, magnitude-based feature extraction techniques etc., which have been proven to be robust in handling noise. Secondly, the survey highlights the various deep learning and other baseline models that are used in backend, for classification of the audio correctly. Finally, it highlights the challenges and issues that still exist in noise handling and detection, while developing noise robust ASV systems. Therefore, on the basis of the proposed survey it can be interpreted that the noise robustness of ASV system is the challenging issue. Hence the researchers should consider the robustness of ASV against noise along with spoofing attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 17

Fig. 18

Automatic speaker verification systems and spoof detection techniques: review and analysis

Article 16 August 2021

Noise Robust Audio Spoof Detection Using Hybrid Feature Extraction and LCNN

Article 13 April 2024

Anti-spoofing Methods for Automatic Speaker Verification System

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Data sharing and Code availability not applicable to this article as no datasets were generated during current study. This study did not use the individual’s data or images.

Abbreviations

MFCC:: Mel frequency cepstral coefficients
LPCC:: Linear predictive cepstral coefficients
PLP:: Perceptual linear prediction
GMM:: Gaussian mixture models
UBM:: Universal background model
SVM:: Support vector machine
PLDA:: Probabilistic linear discriminant analysis
MCEP:: Mel-cepstral coefficients
EER:: Equal error rate
FAR:: False acceptance rate
FRR:: False rejection rate
FFT:: Fast Fourier Transform
CNPCC:: Cosine normalized phase-based cepstral coefficients
LPRC:: LP residual Cepstral Coefficients
DNN:: Deep neural network
VAD:: Voice activity detection
STFT:: Short-time Fourier Transform
LMS:: Log magnitude spectrum
RLMS:: Residual log magnitude spectrum
IF:: Instantaneous frequency derivative
BPD:: Baseband phase difference
GD:: Group delay
MGD:: Modified group delay
MLP:: Multi-layer perceptron
SCMC:: Subband spectral centroid magnitude
MHEC:: Mean Hilbert envelope coefficient
RPS:: Relative phase shift
DBN:: Deep belief network
LFCC:: Linear frequency cepstral coefficients
IIR-CQT:: Infinite impulse response—constant Q transform
CQCC:: Constant-Q Cepstral Coefficients
IMFCC:: Inverse Mel frequency Cepstral Coefficients
CNN:: Convolutional neural network
RNN:: Recurrent neural network
LSTM:: Long short term memory
HMM:: Hidden Markov model
STCC:: Short term Cepstral coefficients
CMVN:: Cepstral mean variance normalization
MSRCC:: Magnitude based spectral root Cepstral coefficients
PSRCC:: Phase based spectral root Cepstral coefficients
CFCC-IF:: Cochlear filter Cepstral coefficient instantaneous frequency
CLDNN:: Convolutional LSTM Neural Network
HFCC:: High frequency Cepstral Coefficients
LCNN:: Light convolutional neural network
GRCNN:: Gated recurrent convolutional neural networks
LDA:: Linear discriminant analysis
CGAN:: Conditional generative adversarial networks
TECC:: Teager energy Cepstral coefficients
ResNet:: Residual network
TDNN:: Time-delayed neural network
DCF:: Detection cost function
IMF:: Intrinsic mode functions
HTER:: Half total equal error rate
SDER:: Spoofing detection error rate
EMD:: Empirical mode decomposition
FBCC:: Filter-based Cepstral Coefficient
VAD:: Voice activity detection

References

Wu, Z., Evans, N., Kinnunen, T., Yamagishi, J., Alegre, F., & Li, H. (2015). Spoofing and countermeasures for speaker verification: A survey. Speech Communication, 66, 130–153.
Article Google Scholar
Malik, K. M., Malik, H., & Baumann, R. (2019). “Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2019, 523–528.
Article Google Scholar
Patil, H. A., & Kamble, M. R. (2018). A survey on replay attack detection for automatic speaker verification (ASV) system. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, 1047–1053.
Article Google Scholar
Qian, Y., Chen, N., Dinkel, H., & Wu, Z. (2017). Deep feature engineering for noise robust spoofing detection. AIEEE/ACM Transactions on Audio Speech and Language Processing, 25(10), 1942–1955.
Google Scholar
Lavrentyeva, G., Novoselov, S., Malykh, E., Kozlov, A., Kudashev, O., Shchemelinin, V. (2017). Audio replay attack detection with deep learning frameworks. Interspeech 82–86.
Wu, H., Liu, S., Meng, H., Lee, H. (2020). Defense against adversarial attacks on spoofing countermeasures of ASV. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6564–6568.
Delgado, H., et al. (2021) ASVspoof 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan. arXiv Prepr. arXiv2109.00535
Malik, K. M., Javed, A., Malik, H., & Irtaza, A. (2020). A light-weight replay detection framework for voice controlled IoT devices. IEEE Journal Selected Topics in Signal Processing, 14(5), 982–996.
Article Google Scholar
Li, J., Zhang, X., Sun, M., Zou, X., & Zheng, C. (2019). Attention-based LSTM algorithm for audio replay detection in noisy environments. Applied Sciences, 9(8), 1539.
Article Google Scholar
Kain, A., & Macon, M. W. (2001). Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 2, pp. 813–816.
Wu, Z. et al. (2015). ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge.
Poddar, A., Sahidullah, M., & Saha, G. (2018). Speaker verification with short utterances: A review of challenges, trends and opportunities. IET Biometrics, 7(2), 91–101.
Article Google Scholar
Sahidullah, M., et al. (2019). Introduction to voice presentation attack detection and recent advances. Handbook of biometric anti-spoofing, pp. 321–361, Springer, New York.
Kamble, M. R., Sailor, H. B., Patil, H. A., Li, H. (2020). Advances in anti-spoofing: from the perspective of ASVspoof challenges. APSIPA Transactions on Signal Information Processing, vol. 9.
Mittal, A., & Dua, M. (2021). Automatic speaker verification systems and spoof detection techniques: review and analysis. International Journal of Speech Technology, pp. 1–30.
Das, R. K., Yang, J., & Li, H. (2021). Data augmentation with signal companding for detection of logical access attacks. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6349–6353.
Yang, Y. et al. (2019). The SJTU robust anti-spoofing system for the ASVspoof 2019 challenge. Interspeech, pp. 1038–1042.
Chen, N., Qian, Y., Dinkel, H., Chen, B., & Yu, K. (2015). Robust deep feature for spoofing detection—The SJTU system for ASVspoof 2015 challenge.
Li, R., Zhao, M., Li, Z., Li, L., & Hong, Q. (2019). Anti-spoofing speaker verification system with multi-feature integration and multi-task learning. Interspeech pp. 1048–1052.
Chettri, B., Benetos, E., & Sturm, B. L. T. (2020). Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark. IEEE/ACM Transactions on Audio, Speech, Language Processing, vol. 28, pp. 3018–3028.
Tian, X., Wu, Z., Xiao, X., Chng, E. S., & Li, H. (2016). Spoofing detection under noisy conditions: a preliminary investigation and an initial database. arXiv Prepr. arXiv1602.02950.
Tian, X., Wu, Z., Xiao, X., Chng, E. S., & Li, H. (2016). An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions. INTERSPEECH, pp. 1715–1719.
Alsteris, L. D., & Paliwal, K. K. (2007). Short-time phase spectrum in speech processing: A review and some experimental results. Digital Signal Processing, 17(3), 578–616.
Article Google Scholar
Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.
Article Google Scholar
Adiban, M., Sameti, H., Maghsoodi, N., & Shahsavari, S. (2017). Sut system description for anti-spoofing 2017 challenge. In: Proceedings of the 29th Conference on Computational Linguistics and Speech Processing (ROCLING 2017), pp. 264–275.
Hanilci, C., Kinnunen, T., Sahidullah, M., & Sizov, A. (2016). Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise. Speech Communication, 85, 83–97.
Article Google Scholar
Dua, M., Jain, C., & Kumar, S. (2021). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. Journal of Ambient Intelligence and Humanized Computing pp. 1–16
Lin, L., Wang, R., Yan, D., & Dong, L. (2020). A robust method for speech replay attack detection. KSII Transactions on Internet and Information Systems, 14(1), 168–182.
Google Scholar
Yu, H., Sarkar, A., Thomsen, D. A. L., Tan, Z.-H., Ma, Z., & Guo, J. (2016). Effect of multi-condition training and speech enhancement methods on spoofing detection. In: 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), pp. 1–5.
Thomas, S., Ganapathy, S., & Hermansky, H. (2012). Multilingual MLP features for low-resource LVCSR systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4269–4272.
Jaiswal, R., Fitzgerald, D., Coyle, E., & Rickard, S. (2013). Towards shifted nmf for improved monaural separation.
Variani, E., Lei, X., McDermott, E., Moreno, I. L., & Gonzalez-Dominguez, J. (2014) Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4052–4056.
Grézl, F., Karafiát, M., Kontár, S., Cernocky, J. (2007). Probabilistic and bottle-neck features for LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, pp. IV–757.
Sercu, T., Puhrsch, C., Kingsbury, B., & LeCun, Y. (2016). Very deep multilingual convolutional neural networks for LVCSR. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4955–4959.
Qian, Y., Bi, M., Tan, T., & Yu, K. (2016). Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Transactions on Audio, Speech, Language Processing, vol. 24, no. 12, pp. 2263–2276
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610.
Article Google Scholar
Kinnunen, T. et al. (2017). Reddots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research. In: 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 5395–5399.
Yamagishi, J. et al. (2021). ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection. arXiv Prepr. arXiv2109.00537.
Baumann, R., Malik, K. M., Javed, A., Ball, A., Kujawa, B., & Malik, H. (2021). Voice spoofing detection corpus for single and multi-order audio replays. Computer Speech & Language, 65, 101132.
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. National Institute of Standards and Technology Gaithersburg MD.
Cheng, J.-M., & Wang, H.-C. (2004). A method of estimating the equal error rate for automatic speaker verification. International Symposium on Chinese Spoken Language Processing, 2004, 285–288.
Article Google Scholar
Tan, C. B., Hijazi, M. H. A., Khamis, N., Zainol, Z., Coenen, F., & Gani, A. (2021). A survey on presentation attack detection for automatic speaker verification systems: State-of-the-art, taxonomy, issues and future direction. Multimedia Tools Applications, 80(21), 32725–32762.
Article Google Scholar
Kinnunen, T., et al. (2018). t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. arXiv Prepr. arXiv1804.09618.
Sizov, A., Khoury, E., Kinnunen, T., Wu, Z., & Marcel, S. (2015). Joint speaker verification and antispoofing in the i-vector space. IEEE Transactions on Information Forensics and Security, 10(4), 821–832.
Article Google Scholar
Indumathi, A., & Chandra, E. (2012). Survey on speech synthesis. Signal Processing: An International Journal, 6(5), 140.
Google Scholar
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., & Laukkanen, A.-M. (2014). Comparison of human listeners and speaker verification systems using voice mimicry data. Target, 4000, 5000.
Google Scholar
Hautamäki, R. G., Kinnunen, T., Hautamäki, V., Leino, T., & Laukkanen, A.-M. (2013). I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry. Interspeech pp. 930–934.
Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions.
Adavanne, S., Politis, A., & Virtanen, T. (2019). A multi-room reverberant dataset for sound event localization and detection. arXiv Prepr. arXiv1905.08546
Snyder, D., Chen, G., & Povey, D. (2015). Musan: A music, speech, and noise corpus. arXiv Prepr. arXiv1510.08484.
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., & Matsui, T. (2015). Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification.
Wang, Q. et al. (2019) Voicepop: A pop noise based anti-spoofing system for voice authentication on smartphones. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2062–2070.
Gong, Y., Yang, J., Huber, J., MacKnight, M., & Poellabauer, C. (2019). ReMASC: Realistic replay attack corpus for voice controlled systems. arXiv Prepr. arXiv1904.03365.
Sahoo, T. R., & Patra, S. (2014). Silence removal and endpoint detection of speech signal for text independent speaker identification. International Journal of Image, Graphics and Signal Processing, 6(6), 27.
Article Google Scholar
Sathya, A., Swetha, J., Das, K. A., George, K. K., Kumar, C. S. , & Aravinth, J. (2016). Robust features for spoofing detection. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 2410–2414.
Varga, A. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition. ical Report, DRA Speech Res. Unit.
Dean, D., Kanagasundaram, A., Ghaemmaghami, H., Rahman, M. H., & Sridharan, S. (2015). The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition. In: Proceedings of the 16th Annual Conference of the International Speech Communication Association, Interspeech, 2015, pp. 3456–3460.
Akimoto, K., Liew, S. P., Mishima, S., Mizushima, R., Lee, K. A. (2020). POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise. In INTERSPEECH, pp. 1081–1085.
Fletcher, H., & Munson, W. A. (1933). Loudness, its definition, measurement and calculation. Bell System Technical Journal, 12(4), 377–430.
Article Google Scholar
Reddy, C. K. A., Beyrami, E., Pool, J., Cutler, R., Srinivasan, S., & Gehrke, J. (2019). A scalable noisy speech dataset and online subjective test framework. arXiv Prepr. arXiv1909.08050.
Elko, G. W., Meyer, J., Backer, S., & Peissig, J. (2007). Electronic pop protection for microphones. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2007, 46–49.
Article Google Scholar
Shiota, S., Villavicencio, F., Yamagishi, J., Ono, N., Echizen, I., & Matsui, T. (2016). Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector. Odyssey, 2016, 259–263.
Google Scholar
van den Oord, A. et al. (2016). Wavenet: A generative model for raw audio. arXiv Prepr. arXiv1609.03499
Mochizuki, S., Shiota, S., Kiya, H. (2018). Voice liveness detection using phoneme-based pop-noise detector for speaker verification. Threshold 5: 0.
Sahidullah, M., et al. (2017). Robust voice liveness detection and speaker verification using throat microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(1): 44–56.
Tardelli, J. D. (2003). Pilot corpus for multisensor speech processing. Massachusetts Institute of tech Lexington Lincoln Lab.
Patil, S. A., & Hansen, J. H. L. (2010). The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification. Speech Communication, 52(4), 327–340.
Article Google Scholar
Dekens, T., Verhelst, W., Capman, F., & Beaugendre, F. (2010). Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In: 2010 18th European Signal Processing Conference, pp. 1978–1982.
Xu, W., Evans, D., & Qi, Y. (2017). Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv Prepr. arXiv1704.01155.
Wu, Z., Chng, E. S., & Li, H. (2012). Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition.
Loweimi, E., Ahadi, S. M., & Drugman, T. (2013). A new phase-based feature representation for robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7155–7159.
Alam, M. J., Kenny, P., Bhattacharya, G., & Stafylakis, T. (2015). Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015.
Nakagawa, S., Wang, L., & Ohtsuka, S. (2011). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions Audio, Speech, Language Processing, 20(4), 1085–1095.
Article Google Scholar
Delgado, H., et al. (2018). ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements.
Li, Q., & Huang, Y. (2010). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech, Language Processing, 19(6), 1791–1801.
Article Google Scholar
Patil, A. T., Acharya, R., Sai, P. K. A., & Patil, H. A. (2019). Energy separation-based instantaneous frequency estimation for cochlear cepstral feature for replay spoof detection. Interspeech, pp. 2898–2902.
Patel, T. B., & Patil, H. A. (2016). Cochlear filter and instantaneous frequency based features for spoofed speech detection. IEEE Journal of Selected Topics in Signal Processing, 11(4), 618–631.
Article Google Scholar
Patel, T. B., Patil, H. A. (2015). Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech.
Dressler, K. (2006). Sinusoidal extraction using an efficient implementation of a multi-resolution FFT. In: Proceedings of the International Conference on Digital Audio Effects (DAFx-06), pp. 247–252.
Cancela, P., Rocamora, M., López, E. (2009). An efficient multi-resolution spectral transform for music analysis. In: ISMIR, pp. 309–314.
Alam, M. J., Gupta, V., & Kenny, P. (2016). CRIM’s Speech Recognition System for the 4th CHIME Challenge. In: Proceedings on 4th CHIME Challenge, pp. 63–67
Seltzer, M. L., Yu, D., & Wang, Y. (2013). An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp. 7398–7402.
Lippmann, R., Martin, E., & Paul, D. (1987). “Multi-style training for robust isolated-word speech recognition. ICASSP’87 IEEE International Conference on Acoustics, Speech, and Signal Processing, 12, 705–708.
Article Google Scholar
Tan, T. et al. (2016). Speaker-aware training of LSTM-RNNs for acoustic modelling. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5280–5284.
Qian, Y., Tan, T., & Yu, D. (2016). Neural network based multi-factor aware joint training for robust speech recognition. IEEE/ACM Transactions on Audio, Speech, Language Processing, vol. 24, no. 12, pp. 2231–2240
Bu, S., Qian, Y., & Yu, K. (2014). A novel dynamic parameters calculation approach for model compensation.
Soltau, H., Saon, G., & Sainath, T. N. (2014). Joint training of convolutional and non-convolutional neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5572–5576.
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv Prepr. arXiv1207.0580, 2012.
Rennie, S. J., Goel, V., & Thomas, S. (2014). “Annealed dropout training of deep networks. IEEE Spoken Language Technology Workshop (SLT), 2014, 159–164.
Article Google Scholar
Evans, N., Kinnunen, T., Yamagishi, J., Wu, Z., Alegre, F., & De Leon, P. (2014). Speaker recognition anti-spoofing. In: Handbook of biometric anti-spoofing, Springer, Berlin, pp. 125–146.
Novoselov, S., Kozlov, A., Lavrentyeva, G., Simonchik, K., & Shchemelinin, V. (2016). STC anti-spoofing systems for the ASVspoof 2015 challenge, In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5475–5479.
Alam, J., & Kenny, P. (2017). Spoofing detection employing infinite impulse response—constant Q transform-based feature representations. In: 2017 25th European signal processing conference (EUSIPCO), pp. 101–105.
Patel, T. B., & Patil, H. A. (2017). Significance of source–filter interaction for classification of natural vs. spoofed speech. IEEE Journal of Selected Topics on Signal Processings, 11(4), 644–659.
Article Google Scholar
Suthokumar, G., Sethu, V., Wijenayake, C., & Ambikairajah, E. (2018). Modulation Dynamic Features for the Detection of Replay Attacks. Interspeech, pp. 691–695.
Tapkir, P. A., Patil, A. T., Shah, N., & Patil, H. A. (2018). “Novel spectral root cepstral features for replay spoof detection. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, 1945–1950.
Article Google Scholar
Chettri, B., Sturm, B. L., & Benetos, E. (2018). Analysing replay spoofing countermeasure performance under varied conditions. In: 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6.
Liu, M., Wang, L., Oo, Z., Dang, J., Li, D., & Nakagawa, S. (2018). Replay attacks detection using phase and magnitude features with various frequency resolutions. In: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 329–333.
Sriskandaraja, K. (2018). Spoofing countermeasures for secure and robust voice authentication system: Feature extraction and modelling. University of New South Wales
Lavrentyeva, G., Novoselov, S., Volkova, M., Matveev, Y., & De Marsico, M. (2019). Phonespoof: A new dataset for spoofing attack detection in telephone channel. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2572–2576.
Himawan, I., Villavicencio, F., Sridharan, S., & Fookes, C. (2019). Deep domain adaptation for anti-spoofing in speaker verification systems. Computer Speech & Language, 58, 377–402.
Article Google Scholar
Gomez-Alanis, A., Peinado, A. M., Gonzalez, J. A., & Gomez, A. M. (2019). A gated recurrent convolutional neural network for robust spoofing detection. IEEE/ACM Transactions on Audio, Speech, Language Processing, vol. 27, no. 12, pp. 1985–1999.
Bollepalli, B., Juvela, L., & Alku, P. (2019). Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis. arXiv Prepr. arXiv1903.05955.
Faisal, M. Y., & Suyanto, S. (2019). “SpecAugment impact on automatic speaker verification system. International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019, 305–308.
Article Google Scholar
Das, R. K., Tian, X., Kinnunen, T., & Li, H. (2020). The attacker’s perspective on automatic speaker verification: An overview. arXiv Prepr. arXiv2004.08849.
Halpern, B. M., Kelly, F., van Son, R., & Alexander, A. (2020). Residual networks for resisting noise: analysis of an embeddings-based spoofing countermeasure.
Cai, D., Cai, W., & Li, M. (2020). Within-sample variability-invariant loss for robust speaker recognition under noisy environments. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6469–6473.
Gomez-Alanis, A., Gonzalez-Lopez, J. A., Dubagunta, S. P., Peinado, A. M., & Doss, M. M. (2020). On joint optimization of automatic speaker verification and Anti-Spoofing in the embedding space. IEEE Transactions on Information Forensics and Security vol. 16, pp. 1579–1593.
Rupesh Kumar, S., & Bharathi, B. (2021). A novel approach towards generalization of countermeasure for spoofing attack on ASV systems. Circuits, Systems, and Signal Processing, 40(2), 872–889.
Article Google Scholar
Dua, M., Sadhu, A., Jindal, A., & Mehta, R. (2022). A hybrid noise robust model for multireplay attack detection in Automatic speaker verification systems. Biomedical Signal Processing and Control, 74, 103517.
Article Google Scholar
Joshi, S., & Dua, M. (2022). LSTM-GTCC based Approach for Audio Spoof Detection. In: 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), vol. 1, pp. 656–661.
Chakravarty, N., & Dua, M. (2022). Noise robust ASV spoof detection using integrated features and time delay neural network. SN Computer Science, 4(2), 127. https://doi.org/10.1007/s42979-022-01557-4
Article Google Scholar
Dua, M., Joshi, S., & Dua, S. (2023). “Data augmentation based novel approach to automatic speaker verification system. Ee-Prime-Advances Electric Engineering Electron Energy, 6, 100346.
Article Google Scholar
Joshi, S., & Dua, M. (2022). Multi-order replay attack detection using enhanced feature extraction and deep learning classification. Proceedings of International Conference on Recent Trends in Computing: ICRTC, 2023, 739–745.
Google Scholar
Chakravarty, N., & Dua, M. (2023). Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks. Physica Scripta, 98(9), 96001.
Article Google Scholar
Chakravarty, N., & Dua, M. (2024). An improved feature extraction for Hindi language audio impersonation attack detection. Multimedia Tools and Applications pp. 1–26.
Chakravarty, N., & Dua, M. (2024). A lightweight feature extraction technique for deepfake audio detection. Multimedia Tools and Applications, pp. 1–25.
Sriskandaraja, K., Suthokumar, G., Sethu, V., & Ambikairajah, E. (2017). “Investigating the use of scattering coefficients for replay attack detection. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, 1195–1198.
Article Google Scholar
Dinkel, H., Qian, Y., & Yu, K. (2018). Investigating raw wave deep neural networks for end-to-end speaker spoofing detection. IEEE/ACM Transactions on Audio, Speech, Language Processing, vol. 26, no. 11, pp. 2002–2014.
Mankad, S. H., & Garg, S. (2020). On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Progres in Artificial Intelligence, 9(4), 325–339.
Article Google Scholar
Chakravarty, N., & Dua, M. (2023). Spoof detection using sequentially integrated image and audio features. International Journal of Computing and Digital Systems, 13(1), 1–1.
Article Google Scholar
Dua, M., Meena, S., & Chakravarty, N. (2023). Audio Deepfake detection using data augmented graph frequency cepstral coefficients. In: 2023 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) pp. 1–6.

Download references

Funding

This study did not receive any funding from any of the resource.

Author information

Authors and Affiliations

Department of Computer Engineering, National Institute of Technology, Kurukshetra, India
Sanil Joshi & Mohit Dua

Authors

Sanil Joshi
View author publications
You can also search for this author inPubMed Google Scholar
Mohit Dua
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Both the authors have equal contribution in preparing manuscript. This study is the authors own original work, which has not been previously published elsewhere. All authors implemented the proposed idea. I, Sanil Joshi, wrote the manuscript, including tables and figures, and Dr Mohit Dua reviewed the manuscript.

Corresponding author

Correspondence to Sanil Joshi.

Ethics declarations

Conflict of interest

All the authors and the submitted manuscript do not have any conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Joshi, S., Dua, M. Noise robust automatic speaker verification systems: review and analysis. Telecommun Syst 87, 845–886 (2024). https://doi.org/10.1007/s11235-024-01212-8

Download citation

Accepted: 24 August 2024
Published: 06 September 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s11235-024-01212-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noise robust automatic speaker verification systems: review and analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic speaker verification systems and spoof detection techniques: review and analysis

Noise Robust Audio Spoof Detection Using Hybrid Feature Extraction and LCNN

Anti-spoofing Methods for Automatic Speaker Verification System

Explore related subjects

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now