Abstract
This paper evaluates the impact of low-level features on speaker verification performance, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on) stand-alone as features or followed by PCA as linear projection technique applied before the GMM-UBM back-end classifier in clean and noisy environments. The performances of the MFCC-asymmetric features are compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) that extracted from TIMIT corpus, under clean and noisy conditions. A score level fusion framework based on simples linear methods such as min, max, sum, …, etc. and training methods like SVM is proposed to improve performance and to mitigate noise degradation. The obtained results on corrupted TIMIT database confirm the superiority of fused system in noisy environments against each system alone, and the drastic degradation of the performances of PCA based systems in the presence of environmental noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Reynolds, D. A.: An overview of automatic speaker recognition technology. ICASSP, pp. 4072–4075 (2002)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4(2), 430–451 (2004)
Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Dig. Signal Process. 10(1–3), 19–41 (2000)
Minh, N., Do, M.: Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models. IEEE Signal Process. Lett. 10(4), 115–118 (2003)
Dong, X., Zhaohui, W.: Speaker Recognition using continuous density support vector machines. Electron. Lett. 37(17), 1099–1101 (2001)
Morales-Cordovilla, J.A., Sánchez, V., Gómez, A.M., Peinado, A.M.,: On the use of asymmetric windows for robust speech recognition. Circ. Syst. Signal Process. 31(2), 727–736 (2012)
Rozman, R., Kodek, D.M.: Using asymmetric windows in automatic speech recognition. Speech Commun. 49, 268–276 (2007)
Kitaoka, N., Yamamoto, K., Kusamizu, T., Nakagawa, S., Yamada, T., Tsuge, S., Miyajima, C., Nishiura, T., Nakayama, M., Denda, Y., et al.: Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance. In: ASRU IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 607–612 (2007)
Kinnunen, T., Rajan, P.: A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 7229–7233, Vancouver, Canada, May 2013
Kua, J.M., Epps, J.R., Ambikairajah, E., Nosratighods, M.H.: Front-end diversity in fused speaker recognition systems. In: The Proceedings of APSIPA ASC 2010, Asia-Pacific Signal Processing Association, Hong Kong, Presented at Asia-Pacific Signal Processing Association Conference, Singapore, 14–17 Dec 2010
Kinnunen, T., Li, H.: An overview of text independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40, Science Direct (2009)
Harris, F.: On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE 66(1), 51–84 (1978)
Delac, K., Grgic, M., Grgic, S.: Independent comparative study of PCA, ICA, and LDA on the FERET data set. Technical report, University of Zagreb (2004)
Moore, B.: Hearing. Academic Press, San Diego, ISBN 0-12-505626-5 (1995)
Alam. J., Kenny, P., O Shaughnessy, D.: On the use of asymmetric-shaped tapers for speaker verification using I-Vectors. In: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Singapore, June 2012
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Golub, G.H.: The generalized eigenvalue problem. Lectures on matrix computation, Ph.D. program of the Dipartimento di Matematica Istituto “Guido Castelnuovo”. Lecture No. 11, Roma (2004)
Varga, A.P, et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92, CDROM (1992)
Toh, A.M.: Feature extraction for robust speech recognition in hostile environments. Ph.D. thesis, School of Electrical, Electronic and Computer Engineering, University of Western Australia (UWA) (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asbai, N., Bengherabi, M., Harizi, F., Amrouche, A. (2014). Effect of the Front-End Processing on Speaker Verification Performance Using PCA and Scores Level Fusion. In: Obaidat, M., Filipe, J. (eds) E-Business and Telecommunications. ICETE 2013. Communications in Computer and Information Science, vol 456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44788-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-662-44788-8_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44787-1
Online ISBN: 978-3-662-44788-8
eBook Packages: Computer ScienceComputer Science (R0)