Skip to main content

Using Prosody in Fixed Stress Languages for Improvement of Speech Recognition

  • Conference paper
Verbal and Nonverbal Communication Behaviours

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4775))

Abstract

In this chapter we examine the usage of prosodic features in speech recognition, with a special attention payed to agglutinating and fixed stress languages. Current knowledge in speech prosody exploitation is addressed in the introduction. The used prosodic features, acoustic-prosodic pre-processing, and segmentation in terms of prosodic units are presented in details. We use the expression “prosodic unit” in order to differentiate them from prosodic phrases, which are usually longer. We trained a HMM-based prosodic segmenter relying on fundamental frequency and intensity of speech. The output of this prosodic segmenter is used for N-best lattice rescoring in parallel with a simplified bigram language model in a continuous speech recognizer, in order to improve speech recognition performance. Experiments for Hungarian language show a WER reduction of about 4% using a simple lattice rescoring. The performance of the prosodic segmenter is also investigated in comparison with our earlier experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Venditti, J., Hirschberg, J.: Intonation and discourse processing. In: Proceedings of the 15th ICPhS, Barcelona, pp. 107–114 (2003)

    Google Scholar 

  2. Koutny, I., Olaszy, G., Olaszi, P.: Prosody prediction from text in Hungarian and its realization in TTS conversion. International Journal of Speech Technology 3(3-4), 187–200 (2000)

    Article  MATH  Google Scholar 

  3. Langlais, P., Méloni, H.: Integration of a prosodic component in an automatic speech recognition system. In: 3rd European Conference on Speech Communication and Technology, Berlin, pp. 2007–2010 (1993)

    Google Scholar 

  4. Veilleux, N.M., Ostendorf, M.: Prosody/parse scoring and its application in ATIS. In: Human Language and Technology. Proceedings of the ARPA workshop, Plainsboro, pp. 335–340 (1993)

    Google Scholar 

  5. Kompe, R., Kiessling, A., Niemann, H., Nöth, H., Schukat-Talamazzini, E.G., Zottman, A., Batliner, A.: Prosodic scoring of word hypothesis graphs. In: Proceedings of the European Conference on Speech Communication and Technology, Madrid, pp. 1333–1336 (1995)

    Google Scholar 

  6. Gallwitz, F., Niemann, H., Nöth, E., Warnke, V.: Integrated recognition of words and prosodic phrase boundaries. Speech Communication 36, 81–95 (2002)

    Article  Google Scholar 

  7. Hunyadi, L.: Hungarian Sentence Prosody and Universal Grammar. In: Kertész, A. (ed.) Metalinguistica 13, Peter Lang, Frankfurt am Main, Germany (2002)

    Google Scholar 

  8. Vicsi, K., Szaszák, G.: Automatic Segmentation of Continuous Speech on Word Level Based on Supra-segmental Features. International Journal of Speech Technology 8(4), 363–370 (2005)

    Article  Google Scholar 

  9. Sjölander, K., Beskow, J.: Wavesurfer - an open source speech tool. In: Proceedings of the 6th International Conference of Spoken Language Processing in Beijing, China, vol. 4, pp. 464–467 (2000)

    Google Scholar 

  10. Roach, P.: BABEL: An Eastern European multi-language database. In: International Conference on Speech and Language Processing, Philadelphia (1996)

    Google Scholar 

  11. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, et al.: ToBi: A standard for labelling English prosody. In: Proceedings of the 1992 International Conference on Spoken Language Processing, Banff, pp. 867–870 (1992)

    Google Scholar 

  12. Vicsi, K., Kocsor, A., Tóth, L., Velkei, Sz., Szaszák, Gy., Teleki, Cs., et al.: A Magyar Referencia Beszédadatbázis és alkalmazása orvosi diktálórendszerek kifejlesztéséhez. In: Proceedings of the 3rd Hungarian Conference on Computational Linguistics (MSZNY), Szeged, Hungary, pp. 435–438 (2005)

    Google Scholar 

  13. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., et al.: The HTK Book (for version 3.3). Cambridge University, Cambridge (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Anna Esposito Marcos Faundez-Zanuy Eric Keller Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szaszák, G., Vicsi, K. (2007). Using Prosody in Fixed Stress Languages for Improvement of Speech Recognition. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76442-7_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76441-0

  • Online ISBN: 978-3-540-76442-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics