Effect of the Front-End Processing on Speaker Verification Performance Using PCA and Scores Level Fusion

Asbai, Nassim; Bengherabi, Messaoud; Harizi, Farid; Amrouche, Abderrahmane

doi:10.1007/978-3-662-44788-8_21

Nassim Asbai^3,4,
Messaoud Bengherabi³,
Farid Harizi³ &
…
Abderrahmane Amrouche⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 456))

Included in the following conference series:

International Conference on E-Business and Telecommunications

791 Accesses

Abstract

This paper evaluates the impact of low-level features on speaker verification performance, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on) stand-alone as features or followed by PCA as linear projection technique applied before the GMM-UBM back-end classifier in clean and noisy environments. The performances of the MFCC-asymmetric features are compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) that extracted from TIMIT corpus, under clean and noisy conditions. A score level fusion framework based on simples linear methods such as min, max, sum, …, etc. and training methods like SVM is proposed to improve performance and to mitigate noise degradation. The obtained results on corrupted TIMIT database confirm the superiority of fused system in noisy environments against each system alone, and the drastic degradation of the performances of PCA based systems in the presence of environmental noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Reynolds, D. A.: An overview of automatic speaker recognition technology. ICASSP, pp. 4072–4075 (2002)
Google Scholar
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 4(2), 430–451 (2004)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Dig. Signal Process. 10(1–3), 19–41 (2000)
Article Google Scholar
Minh, N., Do, M.: Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models. IEEE Signal Process. Lett. 10(4), 115–118 (2003)
Article Google Scholar
Dong, X., Zhaohui, W.: Speaker Recognition using continuous density support vector machines. Electron. Lett. 37(17), 1099–1101 (2001)
Article Google Scholar
Morales-Cordovilla, J.A., Sánchez, V., Gómez, A.M., Peinado, A.M.,: On the use of asymmetric windows for robust speech recognition. Circ. Syst. Signal Process. 31(2), 727–736 (2012)
Article Google Scholar
Rozman, R., Kodek, D.M.: Using asymmetric windows in automatic speech recognition. Speech Commun. 49, 268–276 (2007)
Google Scholar
Kitaoka, N., Yamamoto, K., Kusamizu, T., Nakagawa, S., Yamada, T., Tsuge, S., Miyajima, C., Nishiura, T., Nakayama, M., Denda, Y., et al.: Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance. In: ASRU IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 607–612 (2007)
Google Scholar
Kinnunen, T., Rajan, P.: A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), pp. 7229–7233, Vancouver, Canada, May 2013
Google Scholar
Kua, J.M., Epps, J.R., Ambikairajah, E., Nosratighods, M.H.: Front-end diversity in fused speaker recognition systems. In: The Proceedings of APSIPA ASC 2010, Asia-Pacific Signal Processing Association, Hong Kong, Presented at Asia-Pacific Signal Processing Association Conference, Singapore, 14–17 Dec 2010
Google Scholar
Kinnunen, T., Li, H.: An overview of text independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40, Science Direct (2009)
Google Scholar
Harris, F.: On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE 66(1), 51–84 (1978)
Article Google Scholar
Delac, K., Grgic, M., Grgic, S.: Independent comparative study of PCA, ICA, and LDA on the FERET data set. Technical report, University of Zagreb (2004)
Google Scholar
Moore, B.: Hearing. Academic Press, San Diego, ISBN 0-12-505626-5 (1995)
Google Scholar
Alam. J., Kenny, P., O Shaughnessy, D.: On the use of asymmetric-shaped tapers for speaker verification using I-Vectors. In: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Singapore, June 2012
Google Scholar
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Google Scholar
Golub, G.H.: The generalized eigenvalue problem. Lectures on matrix computation, Ph.D. program of the Dipartimento di Matematica Istituto “Guido Castelnuovo”. Lecture No. 11, Roma (2004)
Google Scholar
Varga, A.P, et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92, CDROM (1992)
Google Scholar
Toh, A.M.: Feature extraction for robust speech recognition in hostile environments. Ph.D. thesis, School of Electrical, Electronic and Computer Engineering, University of Western Australia (UWA) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Development of Advanced Technologies, Algiers, Algeria
Nassim Asbai, Messaoud Bengherabi & Farid Harizi
Speech Communication and Signal Processing Laboratory, Faculty of Electronics and Computer Sciences, USTHB, Bab Ezzouar, 16 111, Algiers, Algeria
Nassim Asbai & Abderrahmane Amrouche

Authors

Nassim Asbai
View author publications
You can also search for this author in PubMed Google Scholar
Messaoud Bengherabi
View author publications
You can also search for this author in PubMed Google Scholar
Farid Harizi
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahmane Amrouche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nassim Asbai .

Editor information

Editors and Affiliations

Department of Computer Science, Monmouth University, West Long Branch, New Jersey, USA
Mohammad S. Obaidat
Polytechnic Institute of Setúbal, INSTICC, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asbai, N., Bengherabi, M., Harizi, F., Amrouche, A. (2014). Effect of the Front-End Processing on Speaker Verification Performance Using PCA and Scores Level Fusion. In: Obaidat, M., Filipe, J. (eds) E-Business and Telecommunications. ICETE 2013. Communications in Computer and Information Science, vol 456. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44788-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-662-44788-8_21
Published: 28 September 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44787-1
Online ISBN: 978-3-662-44788-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics