Abstract
The latest Hungarian text-to-speech (TTS) system developed for telephone-based applications is described. The main features are intelligible human-like voice; robust software designed for continuous running; fully automatic conversion of declarative (short and very long) sentences and questions; and real time parallel operation, running on minimum 30 channels. The concept of prosody generation and sound duration processing is introduced. Also, the development environment of Profivox is presented. The market-leader Hungarian mobile service provider applies the TTS system in an automatic e-mail reading application.
Similar content being viewed by others
References
Adriaens, H. (1991). Ein modell deutscher Intonation. University of Leiden, Ph.D. Thesis.
Allen, J., Hunnicut, S., and Klatt, D.H. (1987). From Text to Speech: the MITalk System. Cambridge, U.K., Cambridge University, Press.
Ferenczi, T., Németh, G., Olaszy, G., and Gáspár, Z. (1997). A flexible client-server model for multilingual CTS/TTS development. In Proceedings of Eurospeech ’97, Rhodes, Greece, pp. 693–696.
Hallahan, W.I. (1995). DECtalk software: text-to-speech technology and implementation. Digital Technical Journal, 7:5–19.
Kiss, G. and Olaszy, G. (1984). A HUNGAROVOX magyar nyelvű szótár nélküli valósidejű párbeszédes beszédszintetizáló rendszer. (Hungarovox, a Hungarian real time TTS synthesizer.) Információ Elektronika, 2:98–112.
Koutny, I. (1999). Parsing Hungarian sentences in order to determine their Prosodic structure in a multilingual TTS system. In Proceedings of Eurospeech ’99, pp. 2091–2094.
Koutny, I. and Olaszy, G. (2000). Stress, focus and tempo in Hungarian sentences for TTS conversion, W. Jassem (Ed.), Speech and language technology, Poznan, Poland, pp. 57–70.
Németh, G., Zainkó, Cs., Olaszy, G., and Prószéky, G. (1999). Problems of creating a flexible e-mail reader for Hungarian. In Proceedings of Eurospeech ’99, pp. 939–942.
Olaszy, G. (1982). Some rules for the formant synthesis of Hungarian. In Proceedings of the 8th Acoustic Colloquium, Budapest, pp. 204–210.
Olaszy, G. (1989). MULTIVOX—A flexible text-to-speech system for Hungarian, Finnish, German, Esperanto, Italian and other languages for IBM PC. In Proceedings of the European Conference on Speech Communication and Technology, pp. 525–529.
Olaszy, G., Gordos, and G., Németh, G. (1992). The Multivox multilingual text-to-speech converter. In G. Bailly, C. Benoit, and T.R. Sawallis (Eds.), Talking Machines: Theories, Models, and Designs, Amsterdam, Elsevier, pp. 385–411.
Olaszy, G. (1994). Hangidőtartam-módosító kisérletek a gépi beszéd ritmusának javítására. (Experiment on sound duration changes to prove the rhythm of synthesized speech.) In M. Gósy (Ed.), Beszédkutatás 1994, pp. 140–151. ssss
Olaszy, G. and Németh, G. (1997). Prosody generation for German concept-to-speech systems. (From theoretical intonation patterns to practical realisation.) Speech Communication 21, pp. 37–60.
Olaszy, G. and Olaszi, P. (1998). Hangidőtartamok mesterséges változtatása periódusok kivágásával, megismétlésével. (Changing the sound duration by inserting and deleting pitch periods.) In Beszédkutatás’98M. Gósy (Ed.), MTA Nyelvtudományi Intézet, Budapest, pp. 151–162.
Olaszy, G., Németh, G., Olaszi, P., and Gordos, G. (1999). Interactive TTS supported speech message composer for large, limited but open information systems. In Proceedings of Eurospeech ’99, pp. 943–946.
Olaszy, G. (2000). A magyar beszéd-hangok specifikus időtartamainak meghatàrozàsa folyamatos beszèdre. (The definition of the specific sound durations of Hungarian for continuous speech). In Beszédkutatás ’2000M. Gósy (Ed.), MTA Nyelvtudományi Intézet, Budapest, Hungary, pp. 93–109.
Prószéky, G. and Tihanyi, L. (1993). Humor: High-speed unification morphology and its applications for agglutinative languages. La tribune des industries de la langue, No. 10, OFIL, Paris, pp. 28–29.
van Santen, J.P.H., Shih, C., and Möbius, B. (1998). Intonation. In R. Sproat (Ed.), Multilingual text-to-speech synthesis: The Bell Labs Approach, New York, Kluwer Academic Publishers, pp. 142–189.
van Santen, J.P.H. (1998). Timing. In R. Sproat (Ed.), Multilingual text-to-speech synthesis: The Bell Labs Approach, New York, Kluwer Academic Publishers, pp. 115–139.
Venditti, J.J. and van Santen, J.P.H. (1998). Modelling vowel duration for Japanese text-to-speech synthesis. In Proceedings of the 5th International Conference on Spoken Language Processing, Sydney, pp. 2043–2046.
Zellner, B. (1994). Pauses and the temporal structure of speech. In E. Keller (Ed.), Fundamentals of Speech Synthesis and Speech Recognition, New York, John Wiley & Sons, pp. 42–62.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Olaszy, G., Németh, G., Olaszi, P. et al. Profivox—A Hungarian Text-to-Speech System for Telecommunications Applications. International Journal of Speech Technology 3, 201–215 (2000). https://doi.org/10.1023/A:1026558915015
Issue Date:
DOI: https://doi.org/10.1023/A:1026558915015