Abstract
When training speaker-independent HMM-based acoustic models, a lot of manually transcribed acoustic training data must be available from a good many different speakers. These training databases have a great variation in the pitch of the speakers, articulation and the speed of talking. In practice, the speaker-independent models are used for bootstrapping the speaker-dependent models built by speaker adaptation methods. Thus the performance of the adaptation methods is strongly influenced by the performance of the speaker- independent model and by the accuracy of the automatic segmentation which also depends on the base model. In practice, the performance of the speaker-independent models can vary a great deal on the test speakers. Here our goal is to reduce this performance variability by increasing the performance value for the speakers with low values, at the price of allowing a small drop in the highest performance values. For this purpose we propose a new method for the automatic retraining of speaker-independent HMMs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, L., Rose, R.C.: Speaker normalisation using efficient frequency warping procedures. In: Proc. ICASSP 1996 (1996)
McDonough, J., Byrne, W., Luo, X.: Speaker normalization with all-pass transforms. In: Proc. ICSLP 1998 (1998)
Pitz, M., Ney, H.: Vocal tract normalization as linear transformation of mfcc. In: Proc. EuroSpeech 2003 (2003)
Furui, S.: Cepstral analysis technique for automatic speaker verification. J. Acoust. Soc. Amer. 55, 1204–1312 (1974)
Kitaoka, N., Akahori, I., Nakagawa, S.: Speech recognition under noisy environments using spectral subtraction with smoothing of time direction and real-time cepstral mean normalization. In: Proceedings of the Workshop on Hands-Free Speech Communication, pp. 159–162 (2001)
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hmms. Computer Speech and Language 9, 171–185 (1995)
Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Sankar, A., Lee, C.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. on Speech and Audio Processing 3, 190–202 (1996)
Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of gaussian mixtures. IEEE Trans. on Speech Audio Processing, 357–366 (1995)
Diakoloukas, V., Digalakis, V.: Maximum-likelihood stochastic-transformation adaptation of hidden markov models. IEEE Trans. on Speech Audio Processing 2, 177–187 (1999)
Vicsi, K., Kocsor, A., Teleki, C., Tóth, L.: Hungarian speech database for computer-using environments in offices (in Hungarian). In: Proc. of the 2nd Hungarian Conf. on Computational Linguistics, pp. 315–318 (2004)
Bánhalmi, A., Kocsor, A., Paczolay, D.: Supporting a Hungarian dictation system with novel language models (in Hungarian). In: Proc. of the 3rd Hungarian Conf. on Computational Linguistics, pp. 337–347 (2005)
Banhalmi, A., Paczolay, D., Toth, L., Kocsor, A.: First results of a hungarian medical dictation project. In: Proc. of IS-LTC, pp. 23–26 (2006)
Thelen, E.: Long term on-line speaker adaptation for large vocabulary dictation. In: Proc. of IEEE ICSPL, pp. 2139–2142. IEEE Computer Society Press, Los Alamitos (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bánhalmi, A., Busa-Fekete, R., Kocsor, A. (2007). An Automatic Retraining Method for Speaker Independent Hidden Markov Models. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-74628-7_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)