Skip to main content
Log in

Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Forensic speaker verification performance reduces significantly under high levels of noise and reverberation. Multiple channel speech enhancement algorithms, such as independent component analysis by entropy bound minimization (ICA-EBM), can be used to improve noisy forensic speaker verification performance. Although the ICA-EBM was used in previous studies to separate mixed speech signals under clean conditions, the effectiveness of using the ICA-EBM for improving forensic speaker verification performance under noisy and reverberant conditions has not been investigated yet. In this paper, the ICA-EBM algorithm is used to separate the clean speech from noisy speech signals. Features from the enhanced speech are obtained by combining the feature-warped mel frequency cepstral coefficients with similar features extracted from the discrete wavelet transform. The identity vector (i-vector) length normalized Gaussian probabilistic linear discriminant analysis is used as a classifier. The Australian Forensic Voice Comparison and QUT-NOISE corpora were used to evaluate forensic speaker verification performance under noisy and reverberant conditions. Simulation results demonstrate that forensic speaker verification performance based on ICA-EBM improves compared with that of the traditional independent component analysis under different types of noise and reverberation environments. For surveillance recordings corrupted with different types of noise (CAR, STREET and HOME) at − 10 dB signal to noise ratio, the average equal error rate of the proposed method based on ICA-EBM is better than that of the traditional ICA by 12.68% when the interview recordings are kept clean, and 7.25% when the interview recordings have simulated room reverberations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre J-F, Matrouf D (2009) Forensic speaker recognition. IEEE Signal Process Mag 26:95–103

    Article  Google Scholar 

  2. Mandasari MI, McLaren M, van Leeuwen DA (2012) The effect of noise on modern automatic speaker recognition systems. In: IEEE international conference on acoustic, speech and signal processing, pp 4249–4252

  3. Ganapathy S, Pelecanos J, Omar MK (2011) Feature normalization for speaker verification in room reverberation. In: 2011 IEEE international conference on acoustics, speech and signal processing, pp 4836–4839

  4. Lehmann EA, Johansson AM, Nordholm S (2007) Reverberation-time prediction method for room impulse responses simulated with the image-source model. IEEE workshop on applications of signal processing to audio and acoustics, pp 159–162

  5. Al-Ali AKH, Dean D, Senadji B, Chandran V (2016) Comparison of speech enhancement algorithms for forensic applications. In: 16th Australian international speech science and technology conference, pp 169–172

  6. Ribas D, Vincent E, Calvo JR (2015) Full multicondition training for robust i-vector based speaker recognition. In: Proceedings of interspeech, pp 1057–1061

  7. Rosca J, Balan R, Beaugeant C (2003) Multi-channel psychoacoustically motivated speech enhancement. In: Proceedings of international conference on multimedia and expo, pp I84–I87

  8. González-Rodríguez J, Ortega-García J, Martín C, Hernández L (1996) Increasing robustness in GMM speaker recognition systems for noisy and reverberant speech with low complexity microphone arrays. In: 4th international conference on spoken language, pp 1333–1336

  9. Gannot S, Burshtein D, Weinstein E (2001) Signal enhancement using beamforming and nonstationarity with applications to speech. IEEE Trans Signal Process 49(8):1614–1626

    Article  Google Scholar 

  10. Buckley K, Griffiths L (1986) An adaptive generalized sidelobe canceller with derivative constraints. IEEE Trans Antennas Propag 34(3):311–319

    Article  Google Scholar 

  11. Borowicz A (2014) A robust generalized sidelobe canceller employing speech leakage masking. Adv Comput Sci Res 11:17–29

    Google Scholar 

  12. Jin YG, Shin JW, Kim NS (2014) Spectro-temporal filtering for multichannel speech enhancement in short-time Fourier transform domain. IEEE Signal Process Lett 21(3):352–355

    Article  Google Scholar 

  13. Li X-L, Adali T (2010) Independent component analysis by entropy bound minimization. IEEE Trans Signal Process 58(10):5151–5164

    Article  MathSciNet  Google Scholar 

  14. Sedlák V, Ďuračková D, Záluskỳ R (2012) Investigation impact of environment for performance of ICA for speech separation. IEEE ELEKTRO, pp 89–93

  15. Lee SC, Wang JF, Chen MH (2018) Threshold-based noise detection and reduction for automatic speech recognition system in human–robot interactions. Sens J 18(7):1–12

    Article  Google Scholar 

  16. Shanmugapriya N, Chandra E (2016) Evaluation of sound classification using modified classifier and speech enhancement using ICA algorithm for hearing aid application. ICTACT J Commun Technol 7(1):1279–1288

    Google Scholar 

  17. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430

    Article  Google Scholar 

  18. Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3):626–634

    Article  Google Scholar 

  19. Bell AJ, Sejnowski TJ (1995) An information–maximization approach to blind separation and blind deconvolution. Neural Comput 7(6):1129–1159

    Article  Google Scholar 

  20. Koldovskỳ Z, Málek J, Tichavskỳ P, Deville Y, Hosseini S (2009) Blind separation of piecewise stationary non-Gaussian sources. Signal Process 89(12):2570–2584

    Article  Google Scholar 

  21. Al-Ali AKH, Senadji B, Naik GR (2017) Enhanced forensic speaker verification using multi-run ICA in the presence of environmental noise and reverberation conditions. In: IEEE international conference on signal and image processing applications, pp 174–179

  22. Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314

    Article  Google Scholar 

  23. Morrison GS, Zhang C, Enzinger E, Ochoa F, Bleach D, Johnson M et al (2015) Forensic database of voice recordings of 500+ Australian English speakers. http://databases.forensic-voice-comparison.net/#australian_english_500

  24. Morrison GS, Rose P, Zhang C (2012) Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice. Aust J Forensic Sci 44(2):155–167

    Article  Google Scholar 

  25. Al-Ali AKH, Senadji B, Chandran V (2017) Hybrid DWT and MFCC feature warping for noisy forensic speaker verification in room reverberation. In: IEEE international conference on signal and image processing applications, pp 434–439

  26. Dean DB, Sridharan S, Vogt RJ, Mason MW (2010) The QUT-NOISE- TIMIT corpus for the evaluation of voice activity detection algorithms. In: Proceedings of interspeech

  27. Novotny O, Plchot O, Glembek O, Cernocky JH, Burget L (2018) Analysis of DNN speech signal enhancement for robust speaker recognition. arXiv preprint arXiv:1811.07629, pp 1–16

  28. Lee M, Chang JH (2018) Deep neural network based blind estimation of reverberation time based on multi-channel microphones. Acta Acust United Acust 104(3):486–495

    Article  Google Scholar 

  29. Plinge A, Gannot S (2016) Multi-microphone speech enhancement informed by auditory scene analysis. In: 2016 IEEE sensor array and multichannel signal processing workshop, pp 1–5

  30. Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251

    Article  Google Scholar 

  31. Ferrer L, Bratt H, Burget L, Cernocky H, Glembek O, Graciarena M et al (2011) Promoting robustness for speaker modeling in the community: the PRISM evaluation set. In: Proceedings of NIST 2011 workshop, pp 1–7

  32. Pearce D, Hirsch, HG (2000) The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: 6th international conference of spoken language processing, pp 181–188

  33. Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3

    Article  Google Scholar 

  34. Al-Ali AKH, Dean D, Senadji B, Baktashmotlagh M, Chandran V (2017) Speaker verification with multi-run ICA based speech enhancement. In: 11th international conference on signal processing and communication systems, pp 1–7

  35. Taddese BT (2006) Sound source localization and separation, Mathematics and Computer Science. Macalester College

  36. Lehmann EA, Johansson AM (2008) Prediction of energy decay in room impulse responses simulated with an image-source model. J Acoust Soc Am 124(1):269–277

    Article  Google Scholar 

  37. Adali T, Anderson M, Fu G-S (2014) Diversity in independent component and vector analyses: identifiability, algorithms, and applications in medical imaging. IEEE Signal Process Mag 31(3):18–33

    Article  Google Scholar 

  38. Boukouvalas Z, Mowakeaa R, Fu G-S, Adali T (2016) Independent component analysis by entropy maximization with kernels. arXiv preprint, pp 1–6

  39. Reynolds DA (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2(4):639–643

    Article  Google Scholar 

  40. Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proceedings of speaker odyssey-speaker recognition workshop, pp 1–6

  41. Tzanetakis G, Essl G, Cook P (2001) Audio analysis using the discrete wavelet transform. In: Proceedings conference in acoustic and music theory applications, pp 1–6

  42. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693

    Article  Google Scholar 

  43. Kanagasundaram A, Dean D, Sridharan S, Gonzalez-Dominguez J, Gonzalez-Rodriguez J, Ramos D (2014) Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Commun 59:69–82

    Article  Google Scholar 

  44. Dehak N, Dehak R, Kenny P, Brümmer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low dimensional total variability space for speaker verification. In: Proceedings of interspeech, pp 1559–1562

  45. Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  46. Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of interspeaker variability in speaker verification. IEEE Trans Audio Speech Lang Process 16(5):980–988

    Article  Google Scholar 

  47. McLaren M, van Leeuwen D (2011) Improved speaker recognition when using i-vectors from multiple speech sources. In: IEEE international conference on acoustic, speech and signal processing, pp 5460–5463

  48. Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. Odyssey speaker and language recognition workshop, pp 1–10

  49. Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of interspeech, pp 249–252

  50. Sadjadi SO, Slaney M, Heck L (2013) MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech Lang Process Tech Comm Newsl 1(4):1–32

    Google Scholar 

  51. Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5(99):15400–15413

    Article  Google Scholar 

  52. Al-Ali AKH (2019) Forensic speaker recognition under adverse conditions. PhD Thesis. Queensland University of Technology, Australia

  53. Shabtai NR, Zigel Y, Rafaely B (2008) The effect of GMM order and CMS on speaker recognition with reverberant speech. In: Proceedings of hands-free speech communication and microphone arrays, pp 144–147

  54. Mandasari MI, McLaren M, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of interspeech, pp 21–24

  55. Yoshioka T, Sehr A, Delcroix M, Kinoshita K, Maas R, Nakatani T et al (2012) Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process Mag 29(6):114–126

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Kamil Hasan Al-Ali.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Ali, A.K.H., Chandran, V. & Naik, G.R. Enhanced forensic speaker verification performance using the ICA-EBM algorithm under noisy and reverberant environments. Evol. Intel. 14, 1475–1494 (2021). https://doi.org/10.1007/s12065-020-00406-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00406-8

Keywords

Navigation