Abstract
Proper prosodic structure is crucial for natural-sounding synthesized speech. Because of the lack of other information on discourse structure, we have to rely on syntactic structure in order to predict the main prosodic items for normal speech. To meet this requirement, a dependency-based parser has been developed for Hungarian that assigns the boundaries of functional constituents in the sentence, in other words, the places where new intonation patterns start and breaks can be inserted. We determine stress distribution in the sentence, using four levels including focus. The practical realization of the prosodic predictor also relies on statistical and empirical data. The intonation units (tone groups) with proper melody (e.g., falling, slowly falling, level, rising, slowly rising, rising-falling, and falling-rising) are established on the base of syntactic properties in declarative, interrogative, and imperative sentences. The results are embedded in an experimental Hungarian text-to-speech (TTS) system.
Similar content being viewed by others
References
Dirksen, A. and Quené, H. (1993). Prosodic analysis: The next generation. In V.J. van Heuven and L.C.W. Pols (Eds.), Analysis and Synthesis of Speech. Strategic Research Towards High Quality Text-to-Speech Generation, Berlin, Mouton de Gruyter, pp. 131–144.
É. Kiss (1992). Az egyszerú mondat szerkezete [Structure of the simple Hungarian sentence]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 1. kötet: Mondattan. Bp: Akadémia Kiadó, pp. 79–177.
Fónagy, I. (1998). Intonation in Hungarian. In D. Hirst and A. Di Cristo (Eds.), Intonation Systems, Cambridge, CUP, pp. 328–344.
Fujisaki, H. and Ohno, S. (1997). Comparison and assessment of models in the study of fundamental frequency contours in speech. In Eurospeech ’97, Athens, pp. 131–134.
Gósy, M. (1992). Speech perception. Frankfurt am Main, Hector.
Gósy, M. and Terken, J. (1994). Question marking in Hungarian: Timing and height of pitch peaks. Journal of Phonetics 22: 269–281.
Hajicová, E., Skoumalová, H., and Sgall, P. (1995). An automatic procedure for topic-focus identification. Computational Linguistics, March: 81–94.
Hellwig, P. (1989). Parsing natürlicher Sprachen. In I. Bátori, W. Lenders, and W. Putschke (Eds.), Computational Linguistics, Berlin, Walter de Gruyter, pp. 348–377.
Jassem, W. and Demenko, G. (1997). Fonetyczno-gramatyczna spójność frazy. In W. Jassem, Cz. and Bastura (Eds.), Speech and Language Technology, Vol. 1, Poznan, pp. 125–140.
Kálmán L. and Nádasdy, Á. (1994). A hangsúly ‘Stress’. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 393–467.
Koutny, I. (1998a). Handling some Hungarian structures in dependency framework for natural language processing. Lingua Posnaniensis, 40: 89–101.
Koutny, I. (1998b). Kísérlet magyar nyelvú megnyilatkozások prozódiai jellemzóinek automatikus meghatározására [Attempt to automatically determine some prosodic features of Hungarian utterances.] In M. Gósy (Ed.), Beszédkutatás 1998, Budapest, MTA Nyelvtudományi Intézet, pp. 223–235.
Koutny, I. (1999). Parsing Hungarian sentences in order to determine their prosodic structures in a multilingual TTS system. In Proceedings of Eurospeech ’99. Budapest, pp. 2091–2094.
Koutny, I. and Olaszy, G. (2000). Stress, Focus and Tempo in Hungarian Sentences for TTS Conversion. In W. Jassem, Cz. Basztura, and G. Demenko (Eds.), Speech and Language Technology, Vol. 4, Part 1, Poznań, pp. 57–70.
Monaghan, A.I.C. (1993). Parsing unrestricted text: a multiphase approach. In Eurospeech ’93, Berlin, pp. 1817–1820.
Montero, J.M., Gutiérrez-Arriola, J., Colás, J., Macias, J., Enriquez, E., and Pardo, J.M. (1999). Development of an emotional speech synthesizer in Spanish. In Proceedings of the 6th European Conference on Speech Communication and Technology, pp. 2099–2102.
Möbius, B. (1997). Synthesizing German intonation contours. In J.P.H. van Santen et al. (Eds.), Progress in Speech Synthesis, Springer, pp. 401–415.
Nakatani, C.H. (1999). Prominence variation beyond given/new. In Eurospeech ’99, Budapest, pp. 547–550.
Olaszi, P. (1998). Syntactic analysis of Hungarian sentences to predict prosodic information for speech synthesis. In Proceedings of the Workshop on Circuit Theory, System Information and Applications, Krakow, pp. 49–54.
Olaszy, G. (1989). Gépi beszédelóállítás [Automatic speech generation]. Budapest.
Olaszy, G. (1996a). Szabályrendszer prozódiai elemek gépi megvalósításához [Rule system for the automatic realization of prosodic elements.] In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.
Olaszy, G. (1996b). Számelemek kiejtésének fonetikai vizsgálata. [Phonetic investigation of the pronunciation of numbers]. In M. Gósy (Ed.), Beszédkutatás, MTA Nyelvtudományi Intézet, Budapest, pp. 97–109.
Olaszy, G., Gordos, G., and Németh, G. (1992). The MULTIVOX multilingual text-to-speech converter. In G. Bailly and C. Benoit (Eds.), Talking Machines: Theories, Models, and Designs. Amsterdam, North-Holland. pp. 385–411.
Olaszy, G. and Németh, G. (1997). Prosody generation for German CTS/TTS systems (from theoretical intonation patterns to practical realisation). Speech Communication, 21:37–60.
Pierrehumbert, J. (1980). The phonology and phonetics of English Intonation. Ph.D. Thesis, MIT.
Prószéky, G. (1989). Számítógépes nyelvészet[Computational linguistics]. Budapest, SzámAlk.
Prószéky, G. (1996). Humor. High-speed unification morphology. In Proceedings of TELRI "Language Resources for Language Technology", pp. 149–158.
Rank, E. and Pirker, H. (1998). Generating emotional speech with a concatenative synthesizer. In Proceedings of ICSLP, Sydney, pp. 947–950.
't Hart, J., Collier, R., and Cohen, A. (1990). A Perceptual Study of Intonation. Cambridge: Cambridge University Press.
Varga, L. (1994). A hanglejtés [Intonation]. In F. Kiefer (Ed.), Strukturális magyar nyelvtan. 2. kötet: Fonológia. Bp: Akadémia, pp. 468–549.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Koutny, I., Olaszy, G. & Olaszi, P. Prosody Prediction from Text in Hungarian and its Realization in TTS Conversion. International Journal of Speech Technology 3, 187–200 (2000). https://doi.org/10.1023/A:1026519300902
Issue Date:
DOI: https://doi.org/10.1023/A:1026519300902