Explicit Duration Modelling in HMM/ANN Hybrids

Tóth, László; Kocsor, András

doi:10.1007/11551874_40

László Tóth¹⁹ &
András Kocsor¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

699 Accesses

Abstract

In some languages like Finnish or Hungarian phone duration is a very important distinctive acoustic cue. The conventional HMM speech recognition framework, however, is known to poorly model the duration information. In this paper we compare different duration models within the framework of HMM/ANN hybrids. The tests are performed with two different hybrid models, the conventional one and the “averaging hybrid” recently proposed. Independent of the model configuration, we report that the usual exponential duration model has no detectable advantage over using no duration model at all. Similarly, applying the same fixed value for all state transition probabilities, as is usual with HMM/ANN systems, is found to have no influence on the performance. However, the practical trick of imposing a minimum duration on the phones turns out to be very useful. The key part of the paper is the introduction of the gamma distribution duration model, which proves clearly superior to the exponential one, yielding a 12-20% relative improvement in the word error rate, thus justifying the use of sophisticated duration models in speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Acoustical Frame Rate and Pronunciation Variant Statistics

DNN-Based Duration Modeling for Synthesizing Short Sentences

Complexity of the TDNN Acoustic Model with Respect to the HMM Topology

References

Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition – A Hybrid Approach. Kluwer Academic, Dordrecht (1994)
Google Scholar
Bourlard, H., Hermansky, H., Morgan, N.: Towards Increasing Speech Recognition Error Rates. Speech Communication 18, 205–231 (1996)
Article Google Scholar
Hagen, A., Morris, A.: Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR. Computer Speech and Language 19, 3–30 (2005)
Article Google Scholar
Huang, X.D., Acero, A., Hon, H.-W.: Spoken Language Processing. Prentice-Hall, Englewood Cliffs (2001)
Google Scholar
Huyer, W., Neumaier, A.: SNOBFIT - Stable Noisy Optimization by Branch and Fit (submitted for Publication)
Google Scholar
Morris, A.C., Payne, S., Bourlard, H.: Low Cost Duration Modelling for Noise Robust Speech Recognition. In: Proc. ICSLP 2002, pp. 1025–1028 (2002)
Google Scholar
Pylkönnen, J., Kurimo, M.: Duration Modeling Techniques for Continuous Speech Recognition. In: Proc. ICSLP 2004, pp. 385–388 (2004)
Google Scholar
Tax, D.M.J., van Breukelen, M., Duin, R.P.W., Kittler, J.: Combining multiple classifiers by averaging or by multiplying? Pattern Recognition 33, 1475–1485 (2000)
Article Google Scholar
Tóth, L., Kocsor, A.: Lessons from a Segment-Based Interpretation of HMM/ANN Hybrids. Speech Communication (submitted to)
Google Scholar
Vicsi, K., Tóth, L., Kocsor, A., Csirik, J.: MTBA – A Hungarian Telephone Speech Database. Híradástechnika LVII (8), 35–43 (2002) (in Hungarian)
Google Scholar
Young, S., et al.: The HMM Toolkit (HTK) – software and manual, http://htk.eng.cam.ac.uk
NIST/SEMATECH e-Handbook of Stat.Methods, http://www.itl.nist.gov/div898/handbook/

Download references

Author information

Authors and Affiliations

Research Group on Artificial Intelligence, H-6720, Szeged, Aradi vértanúk tere 1, Hungary
László Tóth & András Kocsor

Authors

László Tóth
View author publications
You can also search for this author in PubMed Google Scholar
András Kocsor
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tóth, L., Kocsor, A. (2005). Explicit Duration Modelling in HMM/ANN Hybrids. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_40

Download citation

DOI: https://doi.org/10.1007/11551874_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics