Skip to main content

An Automatic Retraining Method for Speaker Independent Hidden Markov Models

  • Conference paper
Text, Speech and Dialogue (TSD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

  • 1788 Accesses

Abstract

When training speaker-independent HMM-based acoustic models, a lot of manually transcribed acoustic training data must be available from a good many different speakers. These training databases have a great variation in the pitch of the speakers, articulation and the speed of talking. In practice, the speaker-independent models are used for bootstrapping the speaker-dependent models built by speaker adaptation methods. Thus the performance of the adaptation methods is strongly influenced by the performance of the speaker- independent model and by the accuracy of the automatic segmentation which also depends on the base model. In practice, the performance of the speaker-independent models can vary a great deal on the test speakers. Here our goal is to reduce this performance variability by increasing the performance value for the speakers with low values, at the price of allowing a small drop in the highest performance values. For this purpose we propose a new method for the automatic retraining of speaker-independent HMMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lee, L., Rose, R.C.: Speaker normalisation using efficient frequency warping procedures. In: Proc. ICASSP 1996 (1996)

    Google Scholar 

  2. McDonough, J., Byrne, W., Luo, X.: Speaker normalization with all-pass transforms. In: Proc. ICSLP 1998 (1998)

    Google Scholar 

  3. Pitz, M., Ney, H.: Vocal tract normalization as linear transformation of mfcc. In: Proc. EuroSpeech 2003 (2003)

    Google Scholar 

  4. Furui, S.: Cepstral analysis technique for automatic speaker verification. J. Acoust. Soc. Amer. 55, 1204–1312 (1974)

    Google Scholar 

  5. Kitaoka, N., Akahori, I., Nakagawa, S.: Speech recognition under noisy environments using spectral subtraction with smoothing of time direction and real-time cepstral mean normalization. In: Proceedings of the Workshop on Hands-Free Speech Communication, pp. 159–162 (2001)

    Google Scholar 

  6. Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hmms. Computer Speech and Language 9, 171–185 (1995)

    Article  Google Scholar 

  7. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)

    Article  Google Scholar 

  8. Sankar, A., Lee, C.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. on Speech and Audio Processing 3, 190–202 (1996)

    Article  Google Scholar 

  9. Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of gaussian mixtures. IEEE Trans. on Speech Audio Processing, 357–366 (1995)

    Google Scholar 

  10. Diakoloukas, V., Digalakis, V.: Maximum-likelihood stochastic-transformation adaptation of hidden markov models. IEEE Trans. on Speech Audio Processing 2, 177–187 (1999)

    Article  Google Scholar 

  11. Vicsi, K., Kocsor, A., Teleki, C., Tóth, L.: Hungarian speech database for computer-using environments in offices (in Hungarian). In: Proc. of the 2nd Hungarian Conf. on Computational Linguistics, pp. 315–318 (2004)

    Google Scholar 

  12. Bánhalmi, A., Kocsor, A., Paczolay, D.: Supporting a Hungarian dictation system with novel language models (in Hungarian). In: Proc. of the 3rd Hungarian Conf. on Computational Linguistics, pp. 337–347 (2005)

    Google Scholar 

  13. Banhalmi, A., Paczolay, D., Toth, L., Kocsor, A.: First results of a hungarian medical dictation project. In: Proc. of IS-LTC, pp. 23–26 (2006)

    Google Scholar 

  14. Thelen, E.: Long term on-line speaker adaptation for large vocabulary dictation. In: Proc. of IEEE ICSPL, pp. 2139–2142. IEEE Computer Society Press, Los Alamitos (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bánhalmi, A., Busa-Fekete, R., Kocsor, A. (2007). An Automatic Retraining Method for Speaker Independent Hidden Markov Models. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics