Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments

Al-Ali, Ahmed Kamil Hasan; Chandran, Vinod; Naik, Ganesh R.

doi:10.1007/s12065-020-00406-8

Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments

Research Paper
Published: 07 May 2020

Volume 14, pages 1475–1494, (2021)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Ahmed Kamil Hasan Al-Ali¹,
Vinod Chandran² &
Ganesh R. Naik³

185 Accesses
2 Citations
Explore all metrics

Abstract

Forensic speaker verification performance reduces significantly under high levels of noise and reverberation. Multiple channel speech enhancement algorithms, such as independent component analysis by entropy bound minimization (ICA-EBM), can be used to improve noisy forensic speaker verification performance. Although the ICA-EBM was used in previous studies to separate mixed speech signals under clean conditions, the effectiveness of using the ICA-EBM for improving forensic speaker verification performance under noisy and reverberant conditions has not been investigated yet. In this paper, the ICA-EBM algorithm is used to separate the clean speech from noisy speech signals. Features from the enhanced speech are obtained by combining the feature-warped mel frequency cepstral coefficients with similar features extracted from the discrete wavelet transform. The identity vector (i-vector) length normalized Gaussian probabilistic linear discriminant analysis is used as a classifier. The Australian Forensic Voice Comparison and QUT-NOISE corpora were used to evaluate forensic speaker verification performance under noisy and reverberant conditions. Simulation results demonstrate that forensic speaker verification performance based on ICA-EBM improves compared with that of the traditional independent component analysis under different types of noise and reverberation environments. For surveillance recordings corrupted with different types of noise (CAR, STREET and HOME) at − 10 dB signal to noise ratio, the average equal error rate of the proposed method based on ICA-EBM is better than that of the traditional ICA by 12.68% when the interview recordings are kept clean, and 7.25% when the interview recordings have simulated room reverberations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

An improved MMSE estimator based modified group delay spectrum for Forensic Automatic Speaker Recognition

Article 15 March 2021

Real-time adaptive training for forensic speaker verification in reverberation conditions

Article 22 December 2023

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

References

Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre J-F, Matrouf D (2009) Forensic speaker recognition. IEEE Signal Process Mag 26:95–103
Article Google Scholar
Mandasari MI, McLaren M, van Leeuwen DA (2012) The effect of noise on modern automatic speaker recognition systems. In: IEEE international conference on acoustic, speech and signal processing, pp 4249–4252
Ganapathy S, Pelecanos J, Omar MK (2011) Feature normalization for speaker verification in room reverberation. In: 2011 IEEE international conference on acoustics, speech and signal processing, pp 4836–4839
Lehmann EA, Johansson AM, Nordholm S (2007) Reverberation-time prediction method for room impulse responses simulated with the image-source model. IEEE workshop on applications of signal processing to audio and acoustics, pp 159–162
Al-Ali AKH, Dean D, Senadji B, Chandran V (2016) Comparison of speech enhancement algorithms for forensic applications. In: 16th Australian international speech science and technology conference, pp 169–172
Ribas D, Vincent E, Calvo JR (2015) Full multicondition training for robust i-vector based speaker recognition. In: Proceedings of interspeech, pp 1057–1061
Rosca J, Balan R, Beaugeant C (2003) Multi-channel psychoacoustically motivated speech enhancement. In: Proceedings of international conference on multimedia and expo, pp I84–I87
González-Rodríguez J, Ortega-García J, Martín C, Hernández L (1996) Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays. In: 4th international conference on spoken language, pp 1333–1336
Gannot S, Burshtein D, Weinstein E (2001) Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans Signal Process 49(8):1614–1626
Article Google Scholar
Buckley K, Griffiths L (1986) An adaptive generalized sidelobe canceller with derivative constraints. IEEE Trans Antennas Propag 34(3):311–319
Article Google Scholar
Borowicz A (2014) A robust generalized sidelobe canceller employing speech leakage masking. Adv Comput Sci Res 11:17–29
Google Scholar
Jin YG, Shin JW, Kim NS (2014) Spectro-temporal filtering for multichannel speech enhancement in short-time Fourier transform domain. IEEE Signal Process Lett 21(3):352–355
Article Google Scholar
Li X-L, Adali T (2010) Independent component analysis by entropy bound minimization. IEEE Trans Signal Process 58(10):5151–5164
Article MathSciNet Google Scholar
Sedlák V, Ďuračková D, Záluskỳ R (2012) Investigation impact of environment for performance of ICA for speech separation. IEEE ELEKTRO, pp 89–93
Lee SC, Wang JF, Chen MH (2018) Threshold-based noise detection and reduction for automatic speech recognition system in human–robot interactions. Sens J 18(7):1–12
Article Google Scholar
Shanmugapriya N, Chandra E (2016) Evaluation of sound classification using modified classifier and speech enhancement using ICA algorithm for hearing aid application. ICTACT J Commun Technol 7(1):1279–1288
Google Scholar
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430
Article Google Scholar
Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3):626–634
Article Google Scholar
Bell AJ, Sejnowski TJ (1995) An information–maximization approach to blind separation and blind deconvolution. Neural Comput 7(6):1129–1159
Article Google Scholar
Koldovskỳ Z, Málek J, Tichavskỳ P, Deville Y, Hosseini S (2009) Blind separation of piecewise stationary non-Gaussian sources. Signal Process 89(12):2570–2584
Article Google Scholar
Al-Ali AKH, Senadji B, Naik GR (2017) Enhanced forensic speaker verification using multi-run ICA in the presence of environmental noise and reverberation conditions. In: IEEE international conference on signal and image processing applications, pp 174–179
Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314
Article Google Scholar
Morrison GS, Zhang C, Enzinger E, Ochoa F, Bleach D, Johnson M et al (2015) Forensic database of voice recordings of 500+ Australian English speakers. http://databases.forensic-voice-comparison.net/#australian_english_500
Morrison GS, Rose P, Zhang C (2012) Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice. Aust J Forensic Sci 44(2):155–167
Article Google Scholar
Al-Ali AKH, Senadji B, Chandran V (2017) Hybrid DWT and MFCC feature warping for noisy forensic speaker verification in room reverberation. In: IEEE international conference on signal and image processing applications, pp 434–439
Dean DB, Sridharan S, Vogt RJ, Mason MW (2010) The QUT-NOISE- TIMIT corpus for the evaluation of voice activity detection algorithms. In: Proceedings of interspeech
Novotny O, Plchot O, Glembek O, Cernocky JH, Burget L (2018) Analysis of DNN speech signal enhancement for robust speaker recognition. arXiv preprint arXiv:1811.07629, pp 1–16
Lee M, Chang JH (2018) Deep neural network based blind estimation of reverberation time based on multi-channel microphones. Acta Acust United Acust 104(3):486–495
Article Google Scholar
Plinge A, Gannot S (2016) Multi-microphone speech enhancement informed by auditory scene analysis. In: 2016 IEEE sensor array and multichannel signal processing workshop, pp 1–5
Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251
Article Google Scholar
Ferrer L, Bratt H, Burget L, Cernocky H, Glembek O, Graciarena M et al (2011) Promoting robustness for speaker modeling in the community: the PRISM evaluation set. In: Proceedings of NIST 2011 workshop, pp 1–7
Pearce D, Hirsch, HG (2000) The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: 6th international conference of spoken language processing, pp 181–188
Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3
Article Google Scholar
Al-Ali AKH, Dean D, Senadji B, Baktashmotlagh M, Chandran V (2017) Speaker verification with multi-run ICA based speech enhancement. In: 11th international conference on signal processing and communication systems, pp 1–7
Taddese BT (2006) Sound source localization and separation, Mathematics and Computer Science. Macalester College
Lehmann EA, Johansson AM (2008) Prediction of energy decay in room impulse responses simulated with an image-source model. J Acoust Soc Am 124(1):269–277
Article Google Scholar
Adali T, Anderson M, Fu G-S (2014) Diversity in independent component and vector analyses: identifiability, algorithms, and applications in medical imaging. IEEE Signal Process Mag 31(3):18–33
Article Google Scholar
Boukouvalas Z, Mowakeaa R, Fu G-S, Adali T (2016) Independent component analysis by entropy maximization with kernels. arXiv preprint, pp 1–6
Reynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2(4):639–643
Article Google Scholar
Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proceedings of speaker odyssey-speaker recognition workshop, pp 1–6
Tzanetakis G, Essl G, Cook P (2001) Audio analysis using the discrete wavelet transform. In: Proceedings conference in acoustic and music theory applications, pp 1–6
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Article Google Scholar
Kanagasundaram A, Dean D, Sridharan S, Gonzalez-Dominguez J, Gonzalez-Rodriguez J, Ramos D (2014) Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Commun 59:69–82
Article Google Scholar
Dehak N, Dehak R, Kenny P, Brümmer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low dimensional total variability space for speaker verification. In: Proceedings of interspeech, pp 1559–1562
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of interspeaker variability in speaker verification. IEEE Trans Audio Speech Lang Process 16(5):980–988
Article Google Scholar
McLaren M, van Leeuwen D (2011) Improved speaker recognition when using i-vectors from multiple speech sources. In: IEEE international conference on acoustic, speech and signal processing, pp 5460–5463
Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. Odyssey speaker and language recognition workshop, pp 1–10
Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of interspeech, pp 249–252
Sadjadi SO, Slaney M, Heck L (2013) MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech Lang Process Tech Comm Newsl 1(4):1–32
Google Scholar
Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5(99):15400–15413
Article Google Scholar
Al-Ali AKH (2019) Forensic speaker recognition under adverse conditions. PhD Thesis. Queensland University of Technology, Australia
Shabtai NR, Zigel Y, Rafaely B (2008) The effect of GMM order and CMS on speaker recognition with reverberant speech. In: Proceedings of hands-free speech communication and microphone arrays, pp 144–147
Mandasari MI, McLaren M, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of interspeech, pp 21–24
Yoshioka T, Sehr A, Delcroix M, Kinoshita K, Maas R, Nakatani T et al (2012) Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process Mag 29(6):114–126
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electromechanical Engineering, University of Technology, Baghdad, Iraq
Ahmed Kamil Hasan Al-Ali
Queensland University of Technology, Brisbane, QLD, 4001, Australia
Vinod Chandran
MARCS Institute, Western Sydney University, Sydney, NSW, 2747, Australia
Ganesh R. Naik

Authors

Ahmed Kamil Hasan Al-Ali
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Chandran
View author publications
You can also search for this author in PubMed Google Scholar
Ganesh R. Naik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Kamil Hasan Al-Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Ali, A.K.H., Chandran, V. & Naik, G.R. Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments. Evol. Intel. 14, 1475–1494 (2021). https://doi.org/10.1007/s12065-020-00406-8

Download citation

Received: 27 May 2019
Revised: 05 March 2020
Accepted: 07 April 2020
Published: 07 May 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s12065-020-00406-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments

Abstract

Access this article

Similar content being viewed by others

An improved MMSE estimator based modified group delay spectrum for Forensic Automatic Speaker Recognition

Real-time adaptive training for forensic speaker verification in reverberation conditions

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments

Abstract

Access this article

Similar content being viewed by others

An improved MMSE estimator based modified group delay spectrum for Forensic Automatic Speaker Recognition

Real-time adaptive training for forensic speaker verification in reverberation conditions

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation