An Automatic Retraining Method for Speaker Independent Hidden Markov Models

Bánhalmi, András; Busa-Fekete, Róbert; Kocsor, András

doi:10.1007/978-3-540-74628-7_50

András Bánhalmi¹,
Róbert Busa-Fekete¹ &
András Kocsor¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

Abstract

When training speaker-independent HMM-based acoustic models, a lot of manually transcribed acoustic training data must be available from a good many different speakers. These training databases have a great variation in the pitch of the speakers, articulation and the speed of talking. In practice, the speaker-independent models are used for bootstrapping the speaker-dependent models built by speaker adaptation methods. Thus the performance of the adaptation methods is strongly influenced by the performance of the speaker- independent model and by the accuracy of the automatic segmentation which also depends on the base model. In practice, the performance of the speaker-independent models can vary a great deal on the test speakers. Here our goal is to reduce this performance variability by increasing the performance value for the speakers with low values, at the price of allowing a small drop in the highest performance values. For this purpose we propose a new method for the automatic retraining of speaker-independent HMMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Automatic Phonetic Segmentation Using the Kaldi Toolkit

Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models

Article 30 December 2015

References

Lee, L., Rose, R.C.: Speaker normalisation using efficient frequency warping procedures. In: Proc. ICASSP 1996 (1996)
Google Scholar
McDonough, J., Byrne, W., Luo, X.: Speaker normalization with all-pass transforms. In: Proc. ICSLP 1998 (1998)
Google Scholar
Pitz, M., Ney, H.: Vocal tract normalization as linear transformation of mfcc. In: Proc. EuroSpeech 2003 (2003)
Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. J. Acoust. Soc. Amer. 55, 1204–1312 (1974)
Google Scholar
Kitaoka, N., Akahori, I., Nakagawa, S.: Speech recognition under noisy environments using spectral subtraction with smoothing of time direction and real-time cepstral mean normalization. In: Proceedings of the Workshop on Hands-Free Speech Communication, pp. 159–162 (2001)
Google Scholar
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hmms. Computer Speech and Language 9, 171–185 (1995)
Article Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar
Sankar, A., Lee, C.: A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Trans. on Speech and Audio Processing 3, 190–202 (1996)
Article Google Scholar
Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of gaussian mixtures. IEEE Trans. on Speech Audio Processing, 357–366 (1995)
Google Scholar
Diakoloukas, V., Digalakis, V.: Maximum-likelihood stochastic-transformation adaptation of hidden markov models. IEEE Trans. on Speech Audio Processing 2, 177–187 (1999)
Article Google Scholar
Vicsi, K., Kocsor, A., Teleki, C., Tóth, L.: Hungarian speech database for computer-using environments in offices (in Hungarian). In: Proc. of the 2nd Hungarian Conf. on Computational Linguistics, pp. 315–318 (2004)
Google Scholar
Bánhalmi, A., Kocsor, A., Paczolay, D.: Supporting a Hungarian dictation system with novel language models (in Hungarian). In: Proc. of the 3rd Hungarian Conf. on Computational Linguistics, pp. 337–347 (2005)
Google Scholar
Banhalmi, A., Paczolay, D., Toth, L., Kocsor, A.: First results of a hungarian medical dictation project. In: Proc. of IS-LTC, pp. 23–26 (2006)
Google Scholar
Thelen, E.: Long term on-line speaker adaptation for large vocabulary dictation. In: Proc. of IEEE ICSPL, pp. 2139–2142. IEEE Computer Society Press, Los Alamitos (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Group on Artificial Intelligence of the Hungarian Academy of Sciences, and of the University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1., Hungary
András Bánhalmi, Róbert Busa-Fekete & András Kocsor

Authors

András Bánhalmi
View author publications
You can also search for this author in PubMed Google Scholar
Róbert Busa-Fekete
View author publications
You can also search for this author in PubMed Google Scholar
András Kocsor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bánhalmi, A., Busa-Fekete, R., Kocsor, A. (2007). An Automatic Retraining Method for Speaker Independent Hidden Markov Models. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_50

Download citation

DOI: https://doi.org/10.1007/978-3-540-74628-7_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics