Skip to main content
Log in

SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

While text-to-speech (TTS) systems for major world languages are quite advanced, smaller languages, like our Slovenian language, lack quality TTS synthesis. At the “Jožef Stefan” Institute a system called SPEAKER (GOVOREC) has been developed. It is capable of automatic conversion of any Slovenian text into speech. The different phases of the synthesis task are performed by several sequentially operating independent modules: text analysis, prosody generation and segmental concatenation. The first module is comprised of text normalization and grapheme-to-phoneme conversion tasks. In order to generate rules for our synthesis scheme, data were collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modeling, and a so-called superpositional approach for pitch modeling. A speech waveform is synthesized using unit selection-based methods and a concatenative TD-PSOLA or HNM+ technique. The system was first implemented in the EMA employment agent, which provides information about available jobs in Slovenia and is now used by members of the Slovenian Foundation for the Blind and Vision-Impaired. Then, it was given free of charge to all people with disabilities. The system was awarded with the first prize for innovation in the field of life improvements for people with disabilities (given by the Government Office for the Disabled and Chronically Sick of the Republic of Slovenia). SPEAKER is freely accessible for non-commercial purposes through the Internet. Currently, several leading Slovenian telecommunication companies are testing the system for providing information (e-mail, short messaging service—SMS, weather reports, traffic information) through mobile phones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., and Syrdal, A. (1999). The AT&T Next-Gen TTS System. 137th Acoustical Society of America Meeting. Berlin.

  • Campbell, N. (1998). Multi-lingual concatenative speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, VII:2835-2838.

    Google Scholar 

  • Dobnikar, A. (1996). Modeling segment intonation for slovene TTS system. ICSLP'96 Proceedings, Philadelphia, 3:1864-1867.

    Google Scholar 

  • Dobnikar,A. (1997). Modelling segment intonation for Slovene textto-speech system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.

  • Dutoit, T. and Leich, H. (1993). MBR-PSOLA: Text-to-speech synthesis based on an MBE re-synthesis of the segments database. Speech Communication, 13:435-440.

    Google Scholar 

  • Fujisaki, H. and Ohno, S. (1995). Analysis and modeling of fundamental frequency contour of English utterances. EUROSPEECH'95 Proceedings, Madrid, Spain, 2:985-988.

    Google Scholar 

  • Gams, M. and Šef, T. (2000). A speech module in an agent system. Engineering Intelligent Systems for Electrical Engineering and Communication, 4:225-232, CRL Publishing Ltd.

    Google Scholar 

  • Gros, J. (1997). Automatic text-to-speech conversion. Ph.D. Thesis. Faculty of computer and information science, University of Ljubljana.

  • Hirst, D.J. and Di Cristo, A. (1995). Intonation Systems, A Survey of 20 Languages. Cambridge: Cambridge University Press.

    Google Scholar 

  • Hirst, D.J., Di Cristo, A., Le Besnerais, M., Najim, Z., Nicolas, P., and Roméas, P. (1993). Multi-lingual modelling of intonation patterns. ESCA Workshop on Prosody, Working Papers 41. Lund University, pp. 204-207.

  • Huang, X., Acero, A., Adock, J., Hon, H.W., Goldsmith, J., Liu, J., and Plumpe M. (1996). Whistler: A trainable text-to-speech system. ICSLP'96 Proceedings, Philadelphia, 4:2387-2390.

    Google Scholar 

  • Kačič, Z. (1997). Copernicus onomastica project COP 58. Final report, March 25., 1997. Maribor: Faculty of Electrical Engineering and Computer Science.

  • Keller, (Ed.) (1994). Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State-of-the-Art and Future Challenges. Chichester/New York/Brisbane/Toronto/Singapore: John Wiley & Sons.

    Google Scholar 

  • Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9:453-467.

    Google Scholar 

  • Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Zaink´o, Cs., and Gordos, G. (2000). Profivox-A Hungarian text-to-speech system for telecommunications applications. International Journal of Speech Technology, 3:201-215.

    Google Scholar 

  • Šef, T. (2001). Text analysis for the slovenian text-to-speech synthesis system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.

  • Šef, T., Dobnikar, A., and Gams, M. (1998). Improvements in Slovene text-to-speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, V:2027-2030.

    Google Scholar 

  • Šef, T. and Gams, M. (2000). A complete text-to-speech system for the Slovenian language. Proceedings of the X European Signal Processing Conference (EUSIPCO-2000),Tampere, Finland, pp. 121-124.

  • Šef, T., Škrjanc, M., and Gams, M. (2002). Automatic lexical stress assignment of unknown words for highly inflected Slovenian language. Proceedings of the Fifth International Conference on Text, Speech, Dialogue (TSD 2002). Brno, Czech Republic, pp. 165-172.

  • Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). TOBI: A standard for labelling English prosody. ICSLP'92 Proceedings, Banff, pp. 867-870.

  • Škrjanc, M., Šef, T., and Gams, M. (2002). Using decision tree for accentuation in the Slovenian language. STAIRS 2002 Proceedings, STarting Artificial Intelligence Researchers Symposium (Frontiers in Artificial Intelligence and Applications, 78), Lyon, France, pp. 135-144.

  • Sproat, (Ed.) (1998). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht/Boston/London: Kluwer Academic Publishers.

    Google Scholar 

  • Srebot Rejec, T. (1988). Word accent and vowel duration in standard Slovene: An acoustic and linguistic investigation. Slawistische Beitr¨age, 226. München: Vewlag Otto Sagner.

  • Topori?si?, J. (1984). Slovene Grammar. Maribor: Založba Obzorja.

  • Weilguny, S. (1993). Grapheme-to-phoneme conversion for the synthesis of isolated words. M.Sc. Thesis. Faculty of Electrical Engineering and Computer Science, University of Ljubljana.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Šef, T., Gams, M. SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System. International Journal of Speech Technology 6, 277–287 (2003). https://doi.org/10.1023/A:1023470304749

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023470304749

Navigation