Abstract
This paper presents a stochastic segmental speech recogniser that models the a posteriori probabilities directly. The main issues concerning the system are segmental phoneme classification, utterance-level aggregation and the pruning of the search space. For phoneme classification, artificial neural networks and support vector machines are applied. Phonemic segmentation and utterancelevel aggregation is performed with the aid of anti-phoneme modelling. At the phoneme level, the system convincingly outperforms the HMM system trained on the same corpus, while at the word level it attains the performance of the HMM system trained without embedded training.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fukada, T., Sagisaka, Y. and Paliwal, K. K., Model Parameter Estimation for Mixture Density Polynomial Segment Models, Proc. of ICASSP’ 97, pp. 1403–1406, Munich, Germany, 1997.
Fukunaga, K., Statistical Pattern Recognition, New York: Academic Press, 1989.
Halberstadt, A. K., Heterogeneous Measurements and Multiple Classifiers for Speech Recognition, Ph.D. Thesis, Dep. Electrical Engineering and Computer Science, MIT, 1998.
Kocsor, A., Tóth, L., Kuba, A. Jr., Kovács, K., Jelasity, M., Gyimóthy, T. and Csirik, J., A Comparative Study of Several Feature Transformation and Learning Methods for Phoneme Classification, accepted for publication in the International Journal of Speech Technology.
Mariani, J., Gauvain, J. L., Lamel, L., Comments on “Towards increasing speech recognition error rates” by H. Bourlard, H. Hermansky, and N. Morgan, Speech Communication, 18 (1996), pp. 249–252.
Morgan, N., Bourlard, H., Greenberg, S., Hermansky, H., Stochastic Perceptual Auditory-Event-Based Models for Speech Recognition, Proc. of ICSLP’ 94, pp. 1943–1946, 1994.
Richard, M. D. and Lippmann, R. P., Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, 3(4):461:483, 1991.
Scholkopf, B., Smola, A. and Müller, K.-R., Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, Vol. 10(5), 1998.
Szarvas, M., Mihajlik, P., Fegyó, T. and Tatai, P., Automatic Recognition of Hungarian: Theory and Practice, accepted for publication in the International Journal of Speech Technology.
Vapnik, V. N., Statistical Learning Theory, John Wiley & Sons Inc., 1998.
Zavaliagkos, G., Zhao, J., Schwartz, R. and Makhoul, J., A Hybrid Segmental Neural Net/Hidden Markov Model System for Continuous Speech Recognition, IEEE Trans. Speech and Audio Proc., Vol. 2, No. 1, Part II, January 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tóth, L., Kocsor, A., Kovács, K. (2000). A Discriminative Segmental Speech Model and Its Application to Hungarian Number Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_52
Download citation
DOI: https://doi.org/10.1007/3-540-45323-7_52
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive