Abstract
While text-to-speech (TTS) systems for major world languages are quite advanced, smaller languages, like our Slovenian language, lack quality TTS synthesis. At the “Jožef Stefan” Institute a system called SPEAKER (GOVOREC) has been developed. It is capable of automatic conversion of any Slovenian text into speech. The different phases of the synthesis task are performed by several sequentially operating independent modules: text analysis, prosody generation and segmental concatenation. The first module is comprised of text normalization and grapheme-to-phoneme conversion tasks. In order to generate rules for our synthesis scheme, data were collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modeling, and a so-called superpositional approach for pitch modeling. A speech waveform is synthesized using unit selection-based methods and a concatenative TD-PSOLA or HNM+ technique. The system was first implemented in the EMA employment agent, which provides information about available jobs in Slovenia and is now used by members of the Slovenian Foundation for the Blind and Vision-Impaired. Then, it was given free of charge to all people with disabilities. The system was awarded with the first prize for innovation in the field of life improvements for people with disabilities (given by the Government Office for the Disabled and Chronically Sick of the Republic of Slovenia). SPEAKER is freely accessible for non-commercial purposes through the Internet. Currently, several leading Slovenian telecommunication companies are testing the system for providing information (e-mail, short messaging service—SMS, weather reports, traffic information) through mobile phones.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., and Syrdal, A. (1999). The AT&T Next-Gen TTS System. 137th Acoustical Society of America Meeting. Berlin.
Campbell, N. (1998). Multi-lingual concatenative speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, VII:2835-2838.
Dobnikar, A. (1996). Modeling segment intonation for slovene TTS system. ICSLP'96 Proceedings, Philadelphia, 3:1864-1867.
Dobnikar,A. (1997). Modelling segment intonation for Slovene textto-speech system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.
Dutoit, T. and Leich, H. (1993). MBR-PSOLA: Text-to-speech synthesis based on an MBE re-synthesis of the segments database. Speech Communication, 13:435-440.
Fujisaki, H. and Ohno, S. (1995). Analysis and modeling of fundamental frequency contour of English utterances. EUROSPEECH'95 Proceedings, Madrid, Spain, 2:985-988.
Gams, M. and Šef, T. (2000). A speech module in an agent system. Engineering Intelligent Systems for Electrical Engineering and Communication, 4:225-232, CRL Publishing Ltd.
Gros, J. (1997). Automatic text-to-speech conversion. Ph.D. Thesis. Faculty of computer and information science, University of Ljubljana.
Hirst, D.J. and Di Cristo, A. (1995). Intonation Systems, A Survey of 20 Languages. Cambridge: Cambridge University Press.
Hirst, D.J., Di Cristo, A., Le Besnerais, M., Najim, Z., Nicolas, P., and Roméas, P. (1993). Multi-lingual modelling of intonation patterns. ESCA Workshop on Prosody, Working Papers 41. Lund University, pp. 204-207.
Huang, X., Acero, A., Adock, J., Hon, H.W., Goldsmith, J., Liu, J., and Plumpe M. (1996). Whistler: A trainable text-to-speech system. ICSLP'96 Proceedings, Philadelphia, 4:2387-2390.
Kačič, Z. (1997). Copernicus onomastica project COP 58. Final report, March 25., 1997. Maribor: Faculty of Electrical Engineering and Computer Science.
Keller, (Ed.) (1994). Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State-of-the-Art and Future Challenges. Chichester/New York/Brisbane/Toronto/Singapore: John Wiley & Sons.
Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9:453-467.
Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Zaink´o, Cs., and Gordos, G. (2000). Profivox-A Hungarian text-to-speech system for telecommunications applications. International Journal of Speech Technology, 3:201-215.
Šef, T. (2001). Text analysis for the slovenian text-to-speech synthesis system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.
Šef, T., Dobnikar, A., and Gams, M. (1998). Improvements in Slovene text-to-speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, V:2027-2030.
Šef, T. and Gams, M. (2000). A complete text-to-speech system for the Slovenian language. Proceedings of the X European Signal Processing Conference (EUSIPCO-2000),Tampere, Finland, pp. 121-124.
Šef, T., Škrjanc, M., and Gams, M. (2002). Automatic lexical stress assignment of unknown words for highly inflected Slovenian language. Proceedings of the Fifth International Conference on Text, Speech, Dialogue (TSD 2002). Brno, Czech Republic, pp. 165-172.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). TOBI: A standard for labelling English prosody. ICSLP'92 Proceedings, Banff, pp. 867-870.
Škrjanc, M., Šef, T., and Gams, M. (2002). Using decision tree for accentuation in the Slovenian language. STAIRS 2002 Proceedings, STarting Artificial Intelligence Researchers Symposium (Frontiers in Artificial Intelligence and Applications, 78), Lyon, France, pp. 135-144.
Sproat, (Ed.) (1998). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht/Boston/London: Kluwer Academic Publishers.
Srebot Rejec, T. (1988). Word accent and vowel duration in standard Slovene: An acoustic and linguistic investigation. Slawistische Beitr¨age, 226. München: Vewlag Otto Sagner.
Topori?si?, J. (1984). Slovene Grammar. Maribor: Založba Obzorja.
Weilguny, S. (1993). Grapheme-to-phoneme conversion for the synthesis of isolated words. M.Sc. Thesis. Faculty of Electrical Engineering and Computer Science, University of Ljubljana.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Šef, T., Gams, M. SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System. International Journal of Speech Technology 6, 277–287 (2003). https://doi.org/10.1023/A:1023470304749
Issue Date:
DOI: https://doi.org/10.1023/A:1023470304749