Skip to main content
Log in

Prosody Prediction from Text in Hungarian and its Realization in TTS Conversion

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Proper prosodic structure is crucial for natural-sounding synthesized speech. Because of the lack of other information on discourse structure, we have to rely on syntactic structure in order to predict the main prosodic items for normal speech. To meet this requirement, a dependency-based parser has been developed for Hungarian that assigns the boundaries of functional constituents in the sentence, in other words, the places where new intonation patterns start and breaks can be inserted. We determine stress distribution in the sentence, using four levels including focus. The practical realization of the prosodic predictor also relies on statistical and empirical data. The intonation units (tone groups) with proper melody (e.g., falling, slowly falling, level, rising, slowly rising, rising-falling, and falling-rising) are established on the base of syntactic properties in declarative, interrogative, and imperative sentences. The results are embedded in an experimental Hungarian text-to-speech (TTS) system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Dirksen, A. and Quené, H. (1993). Prosodic analysis: The next generation. In V.J. van Heuven and L.C.W. Pols (Eds.), Analysis and Synthesis of Speech. Strategic Research Towards High Quality Text-to-Speech Generation, Berlin, Mouton de Gruyter, pp. 131–144.

    Google Scholar 

  • É. Kiss (1992). Az egyszerú mondat szerkezete [Structure of the simple Hungarian sentence]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 1. kötet: Mondattan. Bp: Akadémia Kiadó, pp. 79–177.

  • Fónagy, I. (1998). Intonation in Hungarian. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems, Cambridge, CUP, pp. 328–344.

    Google Scholar 

  • Fujisaki, H. and Ohno, S. (1997). Comparison and assessment of models in the study of fundamental frequency contours in speech. In Eurospeech ’97, Athens, pp. 131–134.

  • Gósy, M. (1992). Speech perception. Frankfurt am Main, Hector.

    Google Scholar 

  • Gósy, M. and Terken, J. (1994). Question marking in Hungarian: Timing and height of pitch peaks. Journal of Phonetics 22: 269–281.

    Google Scholar 

  • Hajicová, E., Skoumalová, H., and Sgall, P. (1995). An automatic procedure for topic-focus identification. Computational Linguistics, March: 81–94.

  • Hellwig, P. (1989). Parsing natürlicher Sprachen. In I. Bátori, W. Lenders, and W. Putschke (Eds.), Computational Linguistics, Berlin, Walter de Gruyter, pp. 348–377.

    Google Scholar 

  • Jassem, W. and Demenko, G. (1997). Fonetyczno-gramatyczna spójność frazy. In W. Jassem, Cz. and Bastura (Eds.), Speech and Language Technology, Vol. 1, Poznan, pp. 125–140.

  • Kálmán L. and Nádasdy, Á. (1994). A hangsúly ‘Stress’. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 393–467.

  • Koutny, I. (1998a). Handling some Hungarian structures in dependency framework for natural language processing. Lingua Posnaniensis, 40: 89–101.

    Google Scholar 

  • Koutny, I. (1998b). Kísérlet magyar nyelvú megnyilatkozások prozódiai jellemzóinek automatikus meghatározására [Attempt to automatically determine some prosodic features of Hungarian utterances.] In M. Gósy (Ed.), Beszédkutatás 1998, Budapest, MTA Nyelvtudományi Intézet, pp. 223–235.

    Google Scholar 

  • Koutny, I. (1999). Parsing Hungarian sentences in order to determine their prosodic structures in a multilingual TTS system. In Proceedings of Eurospeech ’99. Budapest, pp. 2091–2094.

  • Koutny, I. and Olaszy, G. (2000). Stress, Focus and Tempo in Hungarian Sentences for TTS Conversion. In W. Jassem, Cz. Basztura, and G. Demenko (Eds.), Speech and Language Technology, Vol. 4, Part 1, Poznań, pp. 57–70.

  • Monaghan, A.I.C. (1993). Parsing unrestricted text: a multiphase approach. In Eurospeech ’93, Berlin, pp. 1817–1820.

  • Montero, J.M., Gutiérrez-Arriola, J., Colás, J., Macias, J., Enriquez, E., and Pardo, J.M. (1999). Development of an emotional speech synthesizer in Spanish. In Proceedings of the 6th European Conference on Speech Communication and Technology, pp. 2099–2102.

  • Möbius, B. (1997). Synthesizing German intonation contours. In J.P.H. van Santen et al. (Eds.), Progress in Speech Synthesis, Springer, pp. 401–415.

  • Nakatani, C.H. (1999). Prominence variation beyond given/new. In Eurospeech ’99, Budapest, pp. 547–550.

  • Olaszi, P. (1998). Syntactic analysis of Hungarian sentences to predict prosodic information for speech synthesis. In Proceedings of the Workshop on Circuit Theory, System Information and Applications, Krakow, pp. 49–54.

  • Olaszy, G. (1989). Gépi beszédelóállítás [Automatic speech generation]. Budapest.

  • Olaszy, G. (1996a). Szabályrendszer prozódiai elemek gépi megvalósításához [Rule system for the automatic realization of prosodic elements.] In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.

    Google Scholar 

  • Olaszy, G. (1996b). Számelemek kiejtésének fonetikai vizsgálata. [Phonetic investigation of the pronunciation of numbers]. In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.

    Google Scholar 

  • Olaszy, G., Gordos, G., and Németh, G. (1992). The MULTIVOX multilingual text-to-speech converter. In G. Bailly and C. Benoit (Eds.), Talking Machines: Theories, Models, and Designs. Amsterdam, North-Holland. pp. 385–411.

    Google Scholar 

  • Olaszy, G. and Németh, G. (1997). Prosody generation for German CTS/TTS systems (from theoretical intonation patterns to practical realisation). Speech Communication, 21:37–60.

    Google Scholar 

  • Pierrehumbert, J. (1980). The phonology and phonetics of English Intonation. Ph.D. Thesis, MIT.

  • Prószéky, G. (1989). Számítógépes nyelvészet[Computational linguistics]. Budapest, SzámAlk.

    Google Scholar 

  • Prószéky, G. (1996). Humor. High-speed unification morphology. In Proceedings of TELRI "Language Resources for Language Technology", pp. 149–158.

  • Rank, E. and Pirker, H. (1998). Generating emotional speech with a concatenative synthesizer. In Proceedings of ICSLP, Sydney, pp. 947–950.

  • 't Hart, J., Collier, R., and Cohen, A. (1990). A Perceptual Study of Intonation. Cambridge: Cambridge University Press.

    Google Scholar 

  • Varga, L. (1994). A hanglejtés [Intonation]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 468–549.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koutny, I., Olaszy, G. & Olaszi, P. Prosody Prediction from Text in Hungarian and its Realization in TTS Conversion. International Journal of Speech Technology 3, 187–200 (2000). https://doi.org/10.1023/A:1026519300902

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026519300902

Navigation