Abstract
In text-independent speaker recognition, Gaussian Mixture Models (GMMs) are widely employed as statistical models of the speakers. It is assumed that the Expectation Maximization (EM) algorithm can estimate the optimal model parameters such as weight, mean and variance of each Gaussian model for each speaker. However, this is not entirely true since there are practical limitations, such as limited size of the training database and uncertainties in the model parameters. As is well known in the literature, limited-size databases is one of the largest challenges in speaker recognition research. In this paper, we investigate methods to overcome the database and parameter uncertainty problem. By reformulating the GMM estimation problem in a Bayesian-optimal way (as opposed to ML-optimal, as with the EM algorithm), we are able to change the GMM parameters to better cope with limited database size and other parameter uncertainties. Experimental results show the effectiveness of the proposed approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Furui, S.: Recent advances in speaker recognition. Acoustics, Speech, and Signal Processing, ICASSP-89 1, 429–440 (1989)
Atal, B.S.: Automatic recognition of speakers from their voices. Proceedings of the IEEE 64(4), 460–475 (1976)
Davis, P., Mermelstein, S.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech, and Signal Processing, IEEE Transactions on 28(4), 357–366 (1980)
Premakanthan, P., Mikhael, W.B.: Speaker verification/recognition and the importance of selective feature extraction: review. Circuits and Systems, 2001. MWSCAS 2001. In: Proceedings of the 44th IEEE 2001 Midwest Symposium, vol. 1, pp. 57–61 (2001)
Mammone, R.P., Xiaoyu Zhang Ramachandran, R.J.: Robust speaker recognition: a feature-based approach. Signal Processing Magazine, IEEE 13(5), 58–71 (1996)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. Speech and Audio Processing, IEEE Transactions 3, 72–83 (1995)
Zhang, Y., Alder, M., Togneri, R.: Using Gaussian Mixture Modeling in Speech Recognition. Acoustics, Speech, and Signal Processing, 1994. IEEE International Conference (ICASSP-94) i, 613–616 (1994)
Campbell, J.P.: Speaker recognition: A tutorial. Proceedings of the IEEE 85, 1437–1462 (1997)
Eriksson, T., Kim, S., Kang, H.-G., Lee, C.: An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters 12(7), 500–503 (2005)
Douglas, R.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2(4), 639–643 (1994)
Roberts, S.J., Husmeier, D., Rezek, I., Penny, W.D.: Bayesian approaches to gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1133–1142 (1998)
Kay, S.M.: Fundamentals of Statistical Signal Processing, Estimation Theory, Prentice Hall Signal Processing Series, 2nd edn (1993)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley-Interscience Publishers, Chichester (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Garcia, G., Jung, SK., Eriksson, T. (2007). Bayes-Optimal Estimation of GMM Parameters for Speaker Recognition. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-74122-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)