A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Krobba, Ahmed; Debyeche, Mohamed; Selouani, Sid. Ahmed

doi:10.1007/s11042-022-14068-4

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Published: 27 October 2022

Volume 82, pages 16195–16212, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ahmed Krobba¹,
Mohamed Debyeche¹ &
Sid. Ahmed Selouani²

283 Accesses
4 Citations
Explore all metrics

Abstract

Currently, the majority of the state-of-the-art speaker recognition systems predominantly use short-term cepstral feature extraction approaches to parameterize the speech signals. In this paper, we propose new auditory features based Caelen auditory model that simulate the external, middle and inner parts of the ear and Gammtone filter for speaker recognition system, called Caelen Auditory Model Gammatone Cepstral Coefficients (CAMGTCC). The performances evaluations of the proposed feature are carried by the TIMIT and NIST 2008 corpus. The speech coding represent by Adaptive Multi-Rate wideband (AMR-WB) and noisy conditions using various noises SNR levels which are extracted from NOISEX-92. Speaker recognition system using GMM-UBM and i-vector-GPLDA modelling. The experimental results demonstrate that the proposed feature extraction method performs better compared to the Gammatone Cepstral Coefficients (GTCC) and Mel Frequency Cepstral Coefficients (MFCC) features. For speech coding distortion, the features extraction proposed improve the robustness of codec-degraded speech at different bit rates. In addition, when the test speech signals are corrupted with noise at SNRs ranging from (0 dB to 15 dB), we observe that CAMGTCC achieves overall equal error rate (EER) reduction of 10.88% to 6.8% relative, compared to baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic feature extraction method for robust speaker identification

Article 05 May 2015

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Article 30 September 2017

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Article 09 March 2020

References

Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5(15400–15):413–15413
Google Scholar
Bellot O, Matrouf D, Merlin T, Bonastre JF (2000) Additive and convolutional noises compensation for speaker recognition In Sixth International Conference on Spoken Language Processing
Caelen J (1985) Space/Time data-information in the ARIAL project ear model. Speech Commun 4(1)
Chandra, M., Nandi, P., & Mishra, S. (2015). Spectral-subtraction based features for speaker identification. In proceedings of the 3rd international conference on Frontiers of intelligent computing: theory and applications (FICTA) 2014 (pp. 529–536). Springer, Cham
Chaouch H, Merazka F, Marthon P (2019) Multiple description coding technique to improve the robustness of ACELP based coders AMR-WB. Speech Commun 108:33–40
Article Google Scholar
Chu W (2003) Speech coding algorithms: Foundation and Evolution of Standardized Coders A. John Wiley & Sons
Book MATH Google Scholar
Dunn RB, Quatieri TF, Reynolds DA, Campbell JP (2001). Speaker recognition from coded speech and the efects of score normalization. In proceedings of conference record of ThirtyFifth Asilomar conference on signals, systems and computers (Vol. 2, pp. 1562–1567)
Fedila M, Amrouche A (2012) Automatic speaker recognition for mobile communications using AMR- WB speech coding. IEEE, information science, signal processing and their applications , ISSPA, pp. 1034–1038.
Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl:1–19.
Gallardo LF (2016) Human and automatic speaker recognition over telecommunication channels. Springer Science + Business Media, Singapore
Ghitza O (1994) Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans. Speech AudioProcess. 2:115–132
Article Google Scholar
Glasberg M (1990) Derivation of auditory filter shapes from notched-noise data. Journal of Hering Elsevier 47(1–2):103–138
Google Scholar
Grassi S, Besacier L, Dufaux A, Ansorge M, and Pellandini F (2000) Influence of GSM speech coding on the performance of textindependent speaker recognition,” in Proceedings of EUSIPCO pp. 437–440
Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99. https://doi.org/10.1109/MSP.2015.2462851
Article Google Scholar
Johannesma PIM (1972) The pre-response stimulus ensemble of neurons in the cochlear nucleus. In:Symposium on hearing theory (IPO, Eindhoven, The Netherlands), pp. 58–69
Kadi KL, Selouani SA, Boudraa B, Boudraa M (2016) Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge. Biocybernetics Biomed Eng 36(1):233–247
Article Google Scholar
Kinnunen T, Alam MJ, Matejka P, Kenny P, Cernocky J, OShaughnessy D (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. In: Proc. INTERSPEECH. Lyon, France, pp. 3122–3126
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52:12–40
Article Google Scholar
Krobba A, Debyeche M, Amrouche A (2010) Evaluation of speaker identifcation system using GSM-EFR speech data. In proceedings of international conference on design and Technology of Integrated Systems (nanoscale era Hammamet) (pp. 1–5).
Krobba A, Debyeche M, Selouani SA (2019) Maximum entropy PLDA for robust speaker recognition under speech coding distortion. Int J Speech Technol 22(4):1115–1122
Article Google Scholar
Krobba A, Debyeche M, Selouani SA (2019) Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification. Multimedia Tools Appl:1–18.
Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Article Google Scholar
Li Q, Huang Y (2011) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Article Google Scholar
Linguistic Data Consortium (1990) The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. NIST Speech Disc:CD1–1.1
Lyon RF, 1982. A computational model of filtering, detection, and compression in the cochlea. In: IEEE International Conference on Acoustics,Speech, and Signal Processing, pp. 1282–1285
McCree A (2006). Reducing speech coding distortion for speaker identifcation. In Annual Conference (Interspeech) (pp. 941–944)
Mclaren M, Abrash V, Graciarena M, Lei Y, Pesan J (2013) Improving robustness to compressed speech in speaker recognition. Interspeech, In, pp 3698–3702
Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723
Article Google Scholar
NIST Year (2008) Speaker recognition evaluation plan, Technical report, NIST. http:www.itl.nist.gov/iad/mig/yest/ser/2008
Peinado A, Segura J (2006) Speech recognition over digital channels: robustness and standards. isbn:978-0-470-02400-3
Pohjalainen J, Hanilçi C, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Processing Letters 21(12):1516–1520
Article Google Scholar
Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comp Speech Lang 47:240–258
Article Google Scholar
Recommendation G (2003) 722.2:Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB).
Reynolds DA (2002) An overview of automatic speaker recognition technology. In IEEE international conference on acoustics, speech, and signal processing (Vol. 4, pp. IV-4072)
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Processing Letters 17(6):599–602
Article Google Scholar
Samia AE, Nassar MA, Dessouky MI, Nabil A, Adel I, El-Fishawy S, El-Samie FEA. Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools and Applications, 1–16
Selouani SA (2011) Speech Processing and Soft Computing. Springer, New York
Book MATH Google Scholar
Selouani SA, O’Shaughnessy D, Caelen J (2007) Incorporating phonetic knowledge into an evolutionary subspace approach for robust speechrecognition. Int. J. Comput. Appl. 29:143–154
Google Scholar
Selouani SA, Alotaibi Y, Cichocki W, Gharsellaoui S, Kadi K (2015) Native and non-native class discrimination using speech rhythm-and auditory-based cues. Comp Speech Lang 31(1):28–48
Article Google Scholar
Seneff S (1988) A joint synchrony/mean-rate model of auditory speech processing. J. Phon. 16:55–76
Article Google Scholar
Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research In: Proc, IEEE Signal Process, Speech and Language Processing Technical Committee Newsletter
Shabtai NR, Rafaely B, Zigel Y (2011) The effect of reverberation on the performance of cepstral mean subtraction in speaker verification. Appl Acoust 72(2–3):124–126
Article Google Scholar
Sreenivasa RK, Vuppala AK (2014) Speech processing in mobile environments. Springer, ISBN: 978– 319–03116-3.
Tan ZH, Lindberg B (2008) Automatic speech recognition on mobile devices and over communication networks. Springer Science & Business Media
Book MATH Google Scholar
Tan Z, Mak MW, Mak BKW, Zhu Y (2018) Denoised senone i-vectors for robust speaker verification. IEEE/ACM Trans Audio, Speech, Language Process 26(4):820–830
Article Google Scholar
Valero X, Alias F (2012) Gammatone cepstral coefficients: Biologically inspired features for non- speech audio classification. IEEE Transactions on Multimedia 14(6):1684–1689
Article Google Scholar
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251 http://spib.rice.edu/spib/selectnoise.html
Vuppala AK, Rao KS, Chakrabarti S (2013) Improved speaker identifcation in wireless environment. Int J Signal Imaging Syst Eng 6(3):130–137
Article Google Scholar
Yu D, Deng L, Droppo J, Wu J, Gong Y, Acero A (2008) A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 4041-4044). IEEE
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio, Speech, Lang Process 20(5):1608–1161
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Communication and Signal Processing Laboratory, Faculty of Electrical engineering, University of USTHB, Algiers, Algeria
Ahmed Krobba & Mohamed Debyeche
LARIHS Laboratory, Campus Shappaing, University of Moncton, Moncton, Canada
Sid. Ahmed Selouani

Authors

Ahmed Krobba
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author inPubMed Google Scholar
Sid. Ahmed Selouani
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ahmed Krobba.

Ethics declarations

Conflict of interest

The Authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Krobba, A., Debyeche, M. & Selouani, S.A. A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion. Multimed Tools Appl 82, 16195–16212 (2023). https://doi.org/10.1007/s11042-022-14068-4

Download citation

Received: 16 February 2021
Revised: 27 September 2022
Accepted: 09 October 2022
Published: 27 October 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-14068-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel hybrid feature method based on Caelen auditory model and gammatone filterbank for robust speaker recognition under noisy environment and speech coding distortion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Acoustic feature extraction method for robust speaker identification

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now