Abstract
In this paper a speaker adaptation methodology is proposed, which first automatically determines a number of speaker clusters in the training material, then estimates the parameters of the corresponding models, and finally applies a fast match strategy – based on the so called histogram models – to choose the optimal cluster for each test utterance. The fast match strategy is critical to make this methodology useful in real applications, since carrying out several recognition passes – one for each cluster of speakers – , and then selecting the decoded string with the highest likelihood, would be too costly. Preliminary experimentation over two speech databases in Spanish reveal that both the clustering algorithm and the fast match strategy are consistent and reliable. The histogram models, though being suboptimal – they succeeded in guessing the right cluster for unseen test speakers in 85% of the cases with read speech, and in 63% of the cases with spontaneous speech – , yielded around a 6% decrease in error rate in phonetic recognition experiments.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lee, L., Rose, R.: A frequency warping approach to speaker normalization. IEEE Transactions on Speech and Audio Processing 6, 49–60 (1998)
Gauvain, J., Lee, C.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Leggetter, C., Woodland, P.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer, Speech and Language 9, 171–185 (1995)
Gales, M.: Cluster Adaptive Training of Hidden Markov Models. IEEE Transactions on Speech and Audio Processing 8 (2000)
Kuhn, R., Junqua, J., Nguyen, P., Niedzielski, N.: Rapid Speaker Adaptation in Eigenvoice Space. IEEE Transactions on Speech and Audio Processing 8, 695–707 (2000)
Faltlhauser, R., Ruske, G.: Robust Speaker Clustering in Eigenspace. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Madonna di Campiglio (Italy), CD-ROM, paper n. 86 (2001)
Naito, M., Deng, L., Sagisaka, Y.: Speaker clustering for speech recognition using vocal tract parameters. Speech Communication 36, 305–315 (2002)
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rodríguez, L.J., Torres, M.I. (2004). A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive