Abstract
In telephone-based speaker identification, variation in handset characteristics can introduce severe speech variabilityeven for speech uttered by the same speaker. This paper proposes a method to compensate the variation in handset characteristics. In the method, a number of Gaussian mixture models are independently trained to identify the most likely handset given a test utterance. The identified handset is used to select a compensation vector from a set of pre-computed vectors, where the pre-computed vectors are the average frame-by-frame differences between the clean and distorted utterances. The clean features are then recovered by subtracting the selected compensation vector from the distirted vectors. Experimental results based on 138 speakers of the YOHO and telephone YOHO corppora show that the proposed approach is computationally efficient and is able to increase the accuracy from 17% (without compensation) to 85% (with compensation).
S. Y. Kung is on sabbatical from the Princeton, University, Usa. He is currently a distinguished chair professor of the Department of Electronic and Information Engineering, The HOng Kong Polytechnic University. This project was supported by the Hong Kong Polytechnic University Grant No. 1.42.37.A410.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. W. Mak and S. Y. Kung. Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification. In IEEE Trans. on Neural Networks, volume 11, pages 961–969, 2000.
S. Furui. Cepstral analysis technique for automatic speaker verification. IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-29(2):254–272, April 1981.
M. G. Rahim and B. H. Juang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition. IEEE Transactions on Speech and Audio Processing, 4(1):19–30, Jan 1996.
T. F. Lo, K. K. Yiu, and M. W. Mak. A new cepstrum-based channel compensation method for speaker verification. In Proc. Eurospeech’99, volume 2, pages 775–778, Sept. 1999.
K. K. Yiu, M. W. Mak, and S. Y. Kung. Channel distortion compensation based on the measurement of handset’s frequency responses. In International Symposium on Intelligent Multimedia, Video and Speech Processing, 2001.
J. P. Campbell. Testing with YOHO CD-ROm voice verification corpus. In ICASSP’95, volume 1, pages 341–344, 1995.
L. P. Heck and M. Weintraub. Handset dependent background models for robust text-independent speaker recognition. In ICASSP97, volume 2, pages 1071–1074, 1997.
C. Mokbel, D. Jouvet, and J. Monné. Deconvolution of telephone line effects for speech recognition. Speech Communication, 19:185–196, 1996.
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. of Royal Statistical Soc., Ser. B., 39(1):1–38, 1977.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yiu, K.K., Mak, M.W., Kung, S.Y. (2001). A GMM-Based Handset Selector for Channel Mismatch Compensation with Aplications to Speaker Identification. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_156
Download citation
DOI: https://doi.org/10.1007/3-540-45453-5_156
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive