Skip to main content

Developing HMM-Based Recognizers with ESMERALDA

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Abstract

ESMERALDA is an integrated environment for the development of speech recognition systems. It provides a powerful selection of methods for building statistical models together with an efficient incremental recognizer. In this paper the approaches adopted for estimating mixture densities, Hidden Markov Models, and n-gram language models are described as well as the algorithms applied during recognition. Evaluation results on a speaker independent spontaneous speech recognition task demonstrate the capabilities of ESMERALDA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Billa, T. Colhurst, A. El-Jaroudi, R. Iyer, K. Ma, S. Matsoukas, C. Quillen, F. Richardson, M. Siu, G. Zvaliagkos, and H. Gish. Recent experiments in large vocabulary conversational speech recognition. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Phoenix, Arizona, 1999.

    Google Scholar 

  2. H. Brandt-Pook, G. A. Fink, S. Wachsmuth, and G. Sagerer. Integrated recognition and interpretaion of speech for a construction task domain. In Proc. 8th Int. Conf. on Human-Computer Interaction, München, 1999. to appear.

    Google Scholar 

  3. M. Federico, M. Cettelo, F. Brugnara, and G. Antoniol. Language modelling for efficient beam-search. Computer Speech & Language, 9:353–379, 1995.

    Article  Google Scholar 

  4. G. A. Fink, N. Jungclaus, H. Ritter, and G. Sagerer. A communication framework for heterogeneous distributed pattern analysis. In Proc. Int. Conf. on Algorithms And Architectures for Parallel Processing, pages 881–890, Brisbane, 1995.

    Google Scholar 

  5. G. A. Fink, C. Schillo, F. Kummert, and G. Sagerer. Incremental speech recognition for multimodal interfaces. In Proc. 24th Annual Conference of the IEEE Industrial Electronics Society, pages 2012–2017, Aachen, September 1998.

    Google Scholar 

  6. T. Hain, P. C. Woodland, T. R. Niesler, and E. W. D. Whittaker. The 1998 HTK system for transcription of conversational telephone speech. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Phoenix, Arizona, 1999.

    Google Scholar 

  7. X. Huang, Y. Ariki, and M. Jack. Hidden Markov Models for Speech Recognition. Edinburgh University Press, Edinburgh, 1990.

    Google Scholar 

  8. F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997.

    Google Scholar 

  9. K.-F. Lee. Automatic Speech Recognition: The Development of the SPHINX System. Kluwer Academic Publishers, Boston, 1989.

    Google Scholar 

  10. Y. Linde, A. Buzo, and R. Gray. An algorithm for vector quantizer design. IEEE Trans. on Communications, 28(1):84–95, 1980.

    Article  Google Scholar 

  11. H. Ney, R. Haeb-Umbach, B. Tran, and M. Oerder. Improvements in beam search for 10000-word continuous speech recognition. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, volume 1, pages 9–12, San Francisco, 1992.

    Google Scholar 

  12. S. Ortmanns, H. Ney, F. Seide, and I. Lindam. A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition. In Proc. Int. Conf. on Spoken Language Processing, pages 2091–2094, Philadelphia, 1996.

    Google Scholar 

  13. E. G. Schukat-Talamazzini. Automatische Spracherkennung. Vieweg, Wiesbaden, 1995.

    MATH  Google Scholar 

  14. V. Steinbiss, H. Ney, X. Aubert, S. Besling, C. Dugast, U. Essen, R. Haeb-Umbach, R. Kneser, H.-G. Meier, M. Oerder, and B.-H. Tran. The Philips research system for continuous-speech recognition. Philips Journal of Research, 49(4):317–352, 1996.

    Article  Google Scholar 

  15. S. Wachsmuth, G. A. Fink, and G. Sagerer. Integration of parsing and incremental speech recognition. In Proc. of the European Signal Processing Conference, volume 1, pages 371–375, Rhodes, 1998.

    Google Scholar 

  16. M. Westphal. The use of cepstral means in conversational speech recognition. In Proc. European Conf. on Speech Communication and Technology, volume 3, pages 1143–1146, Rhodes, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Finkco], G.A. (1999). Developing HMM-Based Recognizers with ESMERALDA. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_42

Download citation

  • DOI: https://doi.org/10.1007/3-540-48239-3_42

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66494-9

  • Online ISBN: 978-3-540-48239-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics