Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Krobba, Ahmed; Debyeche, Mohamed; Selouani, Sid-Ahmed

doi:10.1007/s11042-020-08748-2

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Published: 09 March 2020

Volume 79, pages 18679–18693, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ahmed Krobba¹,
Mohamed Debyeche¹ &
Sid-Ahmed Selouani²

290 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we present a Mixture Linear Prediction based approach for robust Gammatone Cepstral Coefficients extraction (MLPGCCs). The proposed method provides performance improvement of Automatic Speaker Verification (ASV) using i-vector and Gaussian Probabilistic Linear Discriminant Analysis GPLDA modeling under transmission channel noise. The performance of the extracted MLPGCCs was evaluated using the NIST 2008 database where a single channel microphone recorded conversational speech. The system is analyzed in the presence of different channel transmission noises such as Additive White Gaussian (AWGN) and Rayleigh fading at various Signals to Noise Ratio (SNR) levels. The evaluation results show that the MLPGCCs features are a promising way for the ASV task. Indeed, the speaker verification performance using the MLPGCCs proposed features is significantly improved compared to the conventional Gammatone Frequency Cepstral Coefficients (GFCCs) and Mel Frequency Cepstral Coefficients (MFCCs) features. For speech signals corrupted with AWGN noise at SNRs ranging from (-5 dB to 15 dB), we obtain a significant reduction of the Equal Error Rate (EER) ranging from 9.41% to 6.65% and 3.72% to 1.50%, compared with conventional MFCCs and GFCCs features respectively. In addition, when the test speech signals are corrupted with Rayleigh fading channel we achieve an EER reduction ranging from 23.63% to 7.8% and from 10.88% to 6.8% compared with conventional MFCCs and GFCCs, respectively. We also found that the combination of GFCCs and MLPGCCs gives the highest performance of speaker verification system. The best performance combination achieved is around EER from 0.43% to 0.59% and 1.92% to 3.88%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Article 30 September 2017

Feature Extraction Analysis in a Speaker Identification System

Acoustic feature extraction method for robust speaker identification

Article 05 May 2015

References

Al-Momani O, Gharaibeh KM (2014) Effect of wireless channels on detection and classification of asthma attacks in wireless remote health monitoring systems. Int J Telemed Appl:1–8
Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Communication, Elsevier Science vol 53, pp 110–118
Brummer N, Villiers ED (2011) The BOSARIS toolkit: theory, algorithms and code for surviving the new DCF. In: NIST SRE11 Analysis Workshop, Atlanta (USA), Dec. 2011, pp:1–23 [Online]. Available : https://sites.google.com/site/nikobrummer/bosaris\toolkit\full\paper.pdf
Dehak N et al (2011) Frontend factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4):788–798
Article Google Scholar
Fedila M, Amrouche A (2012) Automatic speaker recognition for mobile communications using AMR-WB speech coding. IEEE, information science, signal processing and their applications , ISSPA, pp 1034–1038
Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl:1–19
Gallardo LF (2016) Human and automatic speaker recognition over telecommunication channels. Springer Science + Business Media, Singapore
Book Google Scholar
Glasberg BR, Moore BCJ (1986) Auditory filter shapes in subjects with unilateral and bilateral cochlear impair- ments. J Acoust Sot Am 79:1020–1033
Article Google Scholar
Glasberg, Moore (1990) Derivation of auditory filter shapes from notched-noise data. Journal of Hering Elsevier, vol 47, issues 1–2, pp 103–138
Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99. https://doi.org/10.1109/MSP.2015.2462851
Article Google Scholar
Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-vectors. In: Proceedings of the international conference on signal, networks, computing, and systems. Springer, New Delhi, pp 85–91
Johannesma PIM (1972) The pre-response stimulus ensemble of neurons in the cochlear nucleus. In: Symposium on hearing theory (IPO, Eindhoven, The Netherlands), pp 58–69
Kaled Dagrouq A, Alkhateeb (2013) Wavelet LPC with neural network for speaker identification system. Wseas Transactions on Signal Processing 9:216–226
Google Scholar
Kanagasundaram A (2018) Improving the performance of GPLDA speaker verification using unsupervised inter-dataset variability compensation approaches. Int J Speech Technol 21:533–544
Kenny P, Stafylakis T, Ouellet P, Alam J, Dumouchel P (2013) PLDA for speaker verification utterances of arbitrary duration. In: Proceedings of IEEE international conference on acoustics, speech signal processing, pp 7649–7653
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Commun 52:12–40
Krobba A, Debyeche M, Amrouche A (2010) Evaluation of speaker identification system using GSM-EFR speech data. In: proc. of int. conf. on design and technology of integrated systems (nanoscale era), Hammamet, DTIS, Tuins, IEEE, pp 1-5
Krobba A, Debyeche M, Selouani SA (2018) Feature extraction using mixture linear prediction Gammatone filter for robust speaker verification over AWGN Channel. 4th International Conference on Signal, Image, Vision and their Applications, Guelma – Algeria, 26–27 November
Li Z, Gao Y (2015) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Article Google Scholar
Li X, Wang L, Zhu J (2017) SNR-multiconditon approaches of robust speaker model compensation based on PLDA in practical environment. In Proceedings on the international conference on artificial intelligence (ICAI) (pp. 146-150). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)
Mak M-W, Pang X, Chien J-T (2016) Mixture of PLDA for noise robust I-vector speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(1):130–142
Article Google Scholar
Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing 15(5):1711–1172
Article Google Scholar
NIST Year (2008) Speaker recognition evaluation plan, Technical report, NIST. http:www.itl.nist.gov/iad/mig/yest/ser/2008
Padilla M, Quatieri T, Reynolds D (2006) Missing feature theory with soft spectral subtraction for speaker. Verification. CSLP, ninth international conference on spoken language processing, Pittsburgh, PA, USA
Pahlavan K, Krishnamurthy P (2011) Principles of wireless networks: a unified approach. Prentice Hall PTR
Pawan K. A, Navnath S. Nehe · Dattatray V. Jadhav · Raghunath S. H, (2012). Robust feature extraction from: spectrum estimated using bispectrum for speaker recognition. Int J Speech Technol 15, pp:433–440.
Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proc. ISCA speaker recognition workshop Odyssey, Chania, Crete, pp 213–218
Pohjalainen J, Alku P (2014) Gaussian mixture linear prediction. IEEE international conference on on acoustics, speech and signal processing (ICASSP), pp 6285-6289
Pohjalainen J, Saeidi R, Kinnunen T, Alku P (2010) Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. In: Proc Interspeech, Japan
Pohjalainen J, Cemal H, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Process Lett 21(12):1516–1520
Article Google Scholar
Prince Simon JD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. IEEE 11th international conference on computer vision. ICCV’07, pp 1–8
Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258
Article Google Scholar
Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022
Article Google Scholar
Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. In: Proc. ISCA SAPA. Pittsburgh, PA, pp 48–52
Recommendation G (2003) 722.2: wideband coding of speech at around 16 kbit/s using adaptive MultiRate wideband (AMR-WB)
Riadh A, Salim S, Said G, Ali CA, Taleb-A (2014) An efficient approach for MFCC feature extraction for text Independant speaker identification system. Int J Commun 9:114–122
Google Scholar
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 17(6):599–602
Article Google Scholar
Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research In: Proc, IEEE Signal Process, Speech and Language Processing Technical Committee Newsletter
Sreenivasa R K, Vuppala AK (2014) Speech processing in mobile environments. Springer, ISBN: 978–319–03116-3
Y. Zhang, Y. Long· X. Shen, H. Wei, M. Yang, H. Ye, H. Mao, (2017). Articulatory movement features for short-duration text dependent speaker verification. Int J Speech Technol 20, 753–759.
Article Google Scholar
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. ICASSP’13, pp 7204–7208
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing 20(5):1608–1616
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université des Sciences et de la Technologie Houari Boumediene (USTHB), LCPTS, Algiers, Algeria
Ahmed Krobba & Mohamed Debyeche
LARIHS Laboratory, Campus Shappaing, University of Moncton, Moncton, Canada
Sid-Ahmed Selouani

Authors

Ahmed Krobba
View author publications
You can also search for this author inPubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author inPubMed Google Scholar
Sid-Ahmed Selouani
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ahmed Krobba.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krobba, A., Debyeche, M. & Selouani, SA. Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise. Multimed Tools Appl 79, 18679–18693 (2020). https://doi.org/10.1007/s11042-020-08748-2

Download citation

Received: 10 April 2019
Revised: 27 December 2019
Accepted: 17 February 2020
Published: 09 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11042-020-08748-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Feature Extraction Analysis in a Speaker Identification System

Acoustic feature extraction method for robust speaker identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now