Skip to main content

Explicit Duration Modelling in HMM/ANN Hybrids

  • Conference paper
Text, Speech and Dialogue (TSD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

  • 699 Accesses

Abstract

In some languages like Finnish or Hungarian phone duration is a very important distinctive acoustic cue. The conventional HMM speech recognition framework, however, is known to poorly model the duration information. In this paper we compare different duration models within the framework of HMM/ANN hybrids. The tests are performed with two different hybrid models, the conventional one and the “averaging hybrid” recently proposed. Independent of the model configuration, we report that the usual exponential duration model has no detectable advantage over using no duration model at all. Similarly, applying the same fixed value for all state transition probabilities, as is usual with HMM/ANN systems, is found to have no influence on the performance. However, the practical trick of imposing a minimum duration on the phones turns out to be very useful. The key part of the paper is the introduction of the gamma distribution duration model, which proves clearly superior to the exponential one, yielding a 12-20% relative improvement in the word error rate, thus justifying the use of sophisticated duration models in speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition – A Hybrid Approach. Kluwer Academic, Dordrecht (1994)

    Google Scholar 

  2. Bourlard, H., Hermansky, H., Morgan, N.: Towards Increasing Speech Recognition Error Rates. Speech Communication 18, 205–231 (1996)

    Article  Google Scholar 

  3. Hagen, A., Morris, A.: Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR. Computer Speech and Language 19, 3–30 (2005)

    Article  Google Scholar 

  4. Huang, X.D., Acero, A., Hon, H.-W.: Spoken Language Processing. Prentice-Hall, Englewood Cliffs (2001)

    Google Scholar 

  5. Huyer, W., Neumaier, A.: SNOBFIT - Stable Noisy Optimization by Branch and Fit (submitted for Publication)

    Google Scholar 

  6. Morris, A.C., Payne, S., Bourlard, H.: Low Cost Duration Modelling for Noise Robust Speech Recognition. In: Proc. ICSLP 2002, pp. 1025–1028 (2002)

    Google Scholar 

  7. Pylkönnen, J., Kurimo, M.: Duration Modeling Techniques for Continuous Speech Recognition. In: Proc. ICSLP 2004, pp. 385–388 (2004)

    Google Scholar 

  8. Tax, D.M.J., van Breukelen, M., Duin, R.P.W., Kittler, J.: Combining multiple classifiers by averaging or by multiplying? Pattern Recognition 33, 1475–1485 (2000)

    Article  Google Scholar 

  9. Tóth, L., Kocsor, A.: Lessons from a Segment-Based Interpretation of HMM/ANN Hybrids. Speech Communication (submitted to)

    Google Scholar 

  10. Vicsi, K., Tóth, L., Kocsor, A., Csirik, J.: MTBA – A Hungarian Telephone Speech Database. Híradástechnika LVII (8), 35–43 (2002) (in Hungarian)

    Google Scholar 

  11. Young, S., et al.: The HMM Toolkit (HTK) – software and manual, http://htk.eng.cam.ac.uk

  12. NIST/SEMATECH e-Handbook of Stat.Methods, http://www.itl.nist.gov/div898/handbook/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tóth, L., Kocsor, A. (2005). Explicit Duration Modelling in HMM/ANN Hybrids. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_40

Download citation

  • DOI: https://doi.org/10.1007/11551874_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28789-6

  • Online ISBN: 978-3-540-31817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics