Abstract
In this paper, we present a Mixture Linear Prediction based approach for robust Gammatone Cepstral Coefficients extraction (MLPGCCs). The proposed method provides performance improvement of Automatic Speaker Verification (ASV) using i-vector and Gaussian Probabilistic Linear Discriminant Analysis GPLDA modeling under transmission channel noise. The performance of the extracted MLPGCCs was evaluated using the NIST 2008 database where a single channel microphone recorded conversational speech. The system is analyzed in the presence of different channel transmission noises such as Additive White Gaussian (AWGN) and Rayleigh fading at various Signals to Noise Ratio (SNR) levels. The evaluation results show that the MLPGCCs features are a promising way for the ASV task. Indeed, the speaker verification performance using the MLPGCCs proposed features is significantly improved compared to the conventional Gammatone Frequency Cepstral Coefficients (GFCCs) and Mel Frequency Cepstral Coefficients (MFCCs) features. For speech signals corrupted with AWGN noise at SNRs ranging from (-5 dB to 15 dB), we obtain a significant reduction of the Equal Error Rate (EER) ranging from 9.41% to 6.65% and 3.72% to 1.50%, compared with conventional MFCCs and GFCCs features respectively. In addition, when the test speech signals are corrupted with Rayleigh fading channel we achieve an EER reduction ranging from 23.63% to 7.8% and from 10.88% to 6.8% compared with conventional MFCCs and GFCCs, respectively. We also found that the combination of GFCCs and MLPGCCs gives the highest performance of speaker verification system. The best performance combination achieved is around EER from 0.43% to 0.59% and 1.92% to 3.88%.
Similar content being viewed by others
References
Al-Momani O, Gharaibeh KM (2014) Effect of wireless channels on detection and classification of asthma attacks in wireless remote health monitoring systems. Int J Telemed Appl:1–8
Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Communication, Elsevier Science vol 53, pp 110–118
Brummer N, Villiers ED (2011) The BOSARIS toolkit: theory, algorithms and code for surviving the new DCF. In: NIST SRE11 Analysis Workshop, Atlanta (USA), Dec. 2011, pp:1–23 [Online]. Available : https://sites.google.com/site/nikobrummer/bosaris\toolkit\full\paper.pdf
Dehak N et al (2011) Frontend factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4):788–798
Fedila M, Amrouche A (2012) Automatic speaker recognition for mobile communications using AMR-WB speech coding. IEEE, information science, signal processing and their applications , ISSPA, pp 1034–1038
Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl:1–19
Gallardo LF (2016) Human and automatic speaker recognition over telecommunication channels. Springer Science + Business Media, Singapore
Glasberg BR, Moore BCJ (1986) Auditory filter shapes in subjects with unilateral and bilateral cochlear impair- ments. J Acoust Sot Am 79:1020–1033
Glasberg, Moore (1990) Derivation of auditory filter shapes from notched-noise data. Journal of Hering Elsevier, vol 47, issues 1–2, pp 103–138
Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99. https://doi.org/10.1109/MSP.2015.2462851
Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-vectors. In: Proceedings of the international conference on signal, networks, computing, and systems. Springer, New Delhi, pp 85–91
Johannesma PIM (1972) The pre-response stimulus ensemble of neurons in the cochlear nucleus. In: Symposium on hearing theory (IPO, Eindhoven, The Netherlands), pp 58–69
Kaled Dagrouq A, Alkhateeb (2013) Wavelet LPC with neural network for speaker identification system. Wseas Transactions on Signal Processing 9:216–226
Kanagasundaram A (2018) Improving the performance of GPLDA speaker verification using unsupervised inter-dataset variability compensation approaches. Int J Speech Technol 21:533–544
Kenny P, Stafylakis T, Ouellet P, Alam J, Dumouchel P (2013) PLDA for speaker verification utterances of arbitrary duration. In: Proceedings of IEEE international conference on acoustics, speech signal processing, pp 7649–7653
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Commun 52:12–40
Krobba A, Debyeche M, Amrouche A (2010) Evaluation of speaker identification system using GSM-EFR speech data. In: proc. of int. conf. on design and technology of integrated systems (nanoscale era), Hammamet, DTIS, Tuins, IEEE, pp 1-5
Krobba A, Debyeche M, Selouani SA (2018) Feature extraction using mixture linear prediction Gammatone filter for robust speaker verification over AWGN Channel. 4th International Conference on Signal, Image, Vision and their Applications, Guelma – Algeria, 26–27 November
Li Z, Gao Y (2015) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Li X, Wang L, Zhu J (2017) SNR-multiconditon approaches of robust speaker model compensation based on PLDA in practical environment. In Proceedings on the international conference on artificial intelligence (ICAI) (pp. 146-150). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)
Mak M-W, Pang X, Chien J-T (2016) Mixture of PLDA for noise robust I-vector speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(1):130–142
Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing 15(5):1711–1172
NIST Year (2008) Speaker recognition evaluation plan, Technical report, NIST. http:www.itl.nist.gov/iad/mig/yest/ser/2008
Padilla M, Quatieri T, Reynolds D (2006) Missing feature theory with soft spectral subtraction for speaker. Verification. CSLP, ninth international conference on spoken language processing, Pittsburgh, PA, USA
Pahlavan K, Krishnamurthy P (2011) Principles of wireless networks: a unified approach. Prentice Hall PTR
Pawan K. A, Navnath S. Nehe · Dattatray V. Jadhav · Raghunath S. H, (2012). Robust feature extraction from: spectrum estimated using bispectrum for speaker recognition. Int J Speech Technol 15, pp:433–440.
Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. In: Proc. ISCA speaker recognition workshop Odyssey, Chania, Crete, pp 213–218
Pohjalainen J, Alku P (2014) Gaussian mixture linear prediction. IEEE international conference on on acoustics, speech and signal processing (ICASSP), pp 6285-6289
Pohjalainen J, Saeidi R, Kinnunen T, Alku P (2010) Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. In: Proc Interspeech, Japan
Pohjalainen J, Cemal H, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Process Lett 21(12):1516–1520
Prince Simon JD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. IEEE 11th international conference on computer vision. ICCV’07, pp 1–8
Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258
Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022
Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. In: Proc. ISCA SAPA. Pittsburgh, PA, pp 48–52
Recommendation G (2003) 722.2: wideband coding of speech at around 16 kbit/s using adaptive MultiRate wideband (AMR-WB)
Riadh A, Salim S, Said G, Ali CA, Taleb-A (2014) An efficient approach for MFCC feature extraction for text Independant speaker identification system. Int J Commun 9:114–122
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Process Lett 17(6):599–602
Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research In: Proc, IEEE Signal Process, Speech and Language Processing Technical Committee Newsletter
Sreenivasa R K, Vuppala AK (2014) Speech processing in mobile environments. Springer, ISBN: 978–319–03116-3
Y. Zhang, Y. Long· X. Shen, H. Wei, M. Yang, H. Ye, H. Mao, (2017). Articulatory movement features for short-duration text dependent speaker verification. Int J Speech Technol 20, 753–759.
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. ICASSP’13, pp 7204–7208
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing 20(5):1608–1616
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Krobba, A., Debyeche, M. & Selouani, SA. Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise. Multimed Tools Appl 79, 18679–18693 (2020). https://doi.org/10.1007/s11042-020-08748-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08748-2