Abstract
This paper describes ANN based posterior estimates and their application to speech recognition. We replaced the standard back-propagation with the L-BFGS quasi-Newton method. We have focused only on posterior based feature vector extraction. Our goal was a feature vector dimension reduction. Thus we designed three posterior transforms to space with dimensionality 1 or 2. The designed transforms were tested on the SpeechDat-East corpus. We also applied the introduced method on a Czech audio-visual corpus. In both cases the methods leads to significant word error rate decrease.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hering, P., Šimandl, M.: Gaussian Sum Approach with Optimal Experiment Design for Neural Network. In: Proceedings of the Ninth IASTED International Conference on Signal and Image Processing, Honolulu, pp. 425–430. ACTA Press (2007)
Šimandl, M., Hering, P.: Recursive Parameters Estimation and Structure Adaptation of Neural Network. In: Proceedings of the Eighth IASTED International Conference on Intelligent Systems and Control, Anaheim, pp. 78–83. ACTA Press (2005)
Schwarz, P., Matějka, P., Černocký, J.: Towards Lower Error Rates in Phoneme Recognition. LNCS (LNAI), pp. 465–472. Springer, Heidelberg (2004)
Schwarz, P., Matějka, P., Černocký, J.: Recognition of Phoneme Strings using Trap Technique. In: Proceedings of 8th International Conference Eurospeech, International Speech Communication Association, pp. 1–4 (2003)
Salomon, R., Hemmen, J.L.V.: Accelerating Backpropagation Through Dynamic Self-Adaptation. Neural Networks 9, 589–601 (1996)
Nocedal, J.: Updating Quasi-Newton Matrices with Limited Storage. Mathematics of Computation 35, 773–782 (1980)
Igel, C., Hüsken, M.: Empirical Evaluation of the Improved Rprop Learning Algorithms. Neurocomputing 50, 105–123,19 (2003)
Pollak, P.: Speechdat(e) – Eastern European Telephone Speech Databases. In: Proceedings LREC 2000 Satelite Workshop XLDB, Athens, Greece, pp. 20–25 (2000)
Císař, P., Železný, M., Krňoul, Z., Kanis, J., Zelinka, J., Müller, L.: Design and Recording of Czech Speech Corpus for Audio-Visual Continuous Speech Recognition. In: Proceedings of the Auditory-Visual Speech Processing International Conference 2005, AVSP 2005, Vancouver Island (2005)
Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: IJCAI, pp. 1137–1143. Morgan Kaufmann, San Francisco (1995)
Ircing, P., Psutka, J.V., Psutka, J.: Using Morphological Information for Robust Language Modeling in Czech ASR System. IEEE Transactions on Audio, Speech, and Language Processing 17, 840–847 (2009)
Prazák, A., Zajíc, Z., Machlica, L., Psutka, J.V.: Fast Speaker Adaptation in Automatic Online Subtitling. In: SIGMAP, pp. 126–130 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zelinka, J., Šmídl, L., Trmal, J., Müller, L. (2010). Posterior Estimates and Transforms for Speech Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-15760-8_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15759-2
Online ISBN: 978-3-642-15760-8
eBook Packages: Computer ScienceComputer Science (R0)