Abstract
This article tries to compare the performance of neural network and Gaussian mixture acoustic models (GMMs). We have carried out tests which match up various models in terms of speed and achieved recognition accuracy. Since the speed-accuracy trade-off is not only dependent on the acoustic model itself, but also on the settings of decoder parameters, we have suggested a comparison based on equal number of active states during the decoding search. Statistical significance measures are also discussed and a new method for confidence interval computation is introduced.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bourlard, H., Morgan, N.: Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions, Summer School on Neural Networks (1997)
Hejtmánek, J., Pavelka, T.: Use of context-dependent units in Czech speech. In: Proc. of PhD Workshop 2007, Balatonfüred, Hungary (2007)
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition, PhD Thesis, Cambridge University Engineering Dept. (1995)
Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (1989)
Pavelka, T., Ekštein, K.: Neural Network Acoustic Model for Recognition of Czech Speech. In: Proc. of PhD Workshop Systems & Control, Izola, Slovenia (2005)
Pavelka, T., Ekštein, K.: JLASER: An Automatic Speech Recognizer Written in Java. In: Proc. of XII International Conference Speech and Computer (SPECOM 2007), Moscow, Russia (2007)
Pavelka, T., Král, P.: Neural Network Acoustic Model with Decision Tree Clustered Triphones. In: Proceedings of 2008 IEEE International Workshop on Machine Learning for Signal Processing, Cancún, Mexico (2008)
Tebelskis, J.: Speech Recognition using Neural Networks, PhD Thesis, Carnegie Mellon University (1995)
Young, S., et al.: The HTK Book (for HTK v. 3.3), Cambridge University Engineering Dept. (2002)
Vávra, F., Pavelka, T., Šedivá, B., Vokáčová, K., Marek, P., Neumanová, M.: Ratio Statistics. In: Proceedings of JČMF ROBUST 2008, Pribylina, Slovakia (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pavelka, T., Ekštein, K. (2009). A Comparison of Acoustic Models Based on Neural Networks and Gaussian Mixtures. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2009. Lecture Notes in Computer Science(), vol 5729. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04208-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-04208-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04207-2
Online ISBN: 978-3-642-04208-9
eBook Packages: Computer ScienceComputer Science (R0)