SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System

Šef, Tomaž; Gams, Matjaž

doi:10.1023/A:1023470304749

SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System

Published: July 2003

Volume 6, pages 277–287, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Tomaž Šef¹ &
Matjaž Gams¹

74 Accesses
2 Citations
Explore all metrics

Abstract

While text-to-speech (TTS) systems for major world languages are quite advanced, smaller languages, like our Slovenian language, lack quality TTS synthesis. At the “Jožef Stefan” Institute a system called SPEAKER (GOVOREC) has been developed. It is capable of automatic conversion of any Slovenian text into speech. The different phases of the synthesis task are performed by several sequentially operating independent modules: text analysis, prosody generation and segmental concatenation. The first module is comprised of text normalization and grapheme-to-phoneme conversion tasks. In order to generate rules for our synthesis scheme, data were collected by analysing the readings of ten speakers, five males and five females. A two-level approach has been used for duration modeling, and a so-called superpositional approach for pitch modeling. A speech waveform is synthesized using unit selection-based methods and a concatenative TD-PSOLA or HNM+ technique. The system was first implemented in the EMA employment agent, which provides information about available jobs in Slovenia and is now used by members of the Slovenian Foundation for the Blind and Vision-Impaired. Then, it was given free of charge to all people with disabilities. The system was awarded with the first prize for innovation in the field of life improvements for people with disabilities (given by the Government Office for the Disabled and Chronically Sick of the Republic of Slovenia). SPEAKER is freely accessible for non-commercial purposes through the Internet. Currently, several leading Slovenian telecommunication companies are testing the system for providing information (e-mail, short messaging service—SMS, weather reports, traffic information) through mobile phones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., and Syrdal, A. (1999). The AT&T Next-Gen TTS System. 137th Acoustical Society of America Meeting. Berlin.
Campbell, N. (1998). Multi-lingual concatenative speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, VII:2835-2838.
Google Scholar
Dobnikar, A. (1996). Modeling segment intonation for slovene TTS system. ICSLP'96 Proceedings, Philadelphia, 3:1864-1867.
Google Scholar
Dobnikar,A. (1997). Modelling segment intonation for Slovene textto-speech system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.
Dutoit, T. and Leich, H. (1993). MBR-PSOLA: Text-to-speech synthesis based on an MBE re-synthesis of the segments database. Speech Communication, 13:435-440.
Google Scholar
Fujisaki, H. and Ohno, S. (1995). Analysis and modeling of fundamental frequency contour of English utterances. EUROSPEECH'95 Proceedings, Madrid, Spain, 2:985-988.
Google Scholar
Gams, M. and Šef, T. (2000). A speech module in an agent system. Engineering Intelligent Systems for Electrical Engineering and Communication, 4:225-232, CRL Publishing Ltd.
Google Scholar
Gros, J. (1997). Automatic text-to-speech conversion. Ph.D. Thesis. Faculty of computer and information science, University of Ljubljana.
Hirst, D.J. and Di Cristo, A. (1995). Intonation Systems, A Survey of 20 Languages. Cambridge: Cambridge University Press.
Google Scholar
Hirst, D.J., Di Cristo, A., Le Besnerais, M., Najim, Z., Nicolas, P., and Roméas, P. (1993). Multi-lingual modelling of intonation patterns. ESCA Workshop on Prosody, Working Papers 41. Lund University, pp. 204-207.
Huang, X., Acero, A., Adock, J., Hon, H.W., Goldsmith, J., Liu, J., and Plumpe M. (1996). Whistler: A trainable text-to-speech system. ICSLP'96 Proceedings, Philadelphia, 4:2387-2390.
Google Scholar
Kačič, Z. (1997). Copernicus onomastica project COP 58. Final report, March 25., 1997. Maribor: Faculty of Electrical Engineering and Computer Science.
Keller, (Ed.) (1994). Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State-of-the-Art and Future Challenges. Chichester/New York/Brisbane/Toronto/Singapore: John Wiley & Sons.
Google Scholar
Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9:453-467.
Google Scholar
Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Zaink´o, Cs., and Gordos, G. (2000). Profivox-A Hungarian text-to-speech system for telecommunications applications. International Journal of Speech Technology, 3:201-215.
Google Scholar
Šef, T. (2001). Text analysis for the slovenian text-to-speech synthesis system. Ph.D. Thesis. Faculty of Computer and Information Science, University of Ljubljana.
Šef, T., Dobnikar, A., and Gams, M. (1998). Improvements in Slovene text-to-speech synthesis. Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), Sydney, Australia, V:2027-2030.
Google Scholar
Šef, T. and Gams, M. (2000). A complete text-to-speech system for the Slovenian language. Proceedings of the X European Signal Processing Conference (EUSIPCO-2000),Tampere, Finland, pp. 121-124.
Šef, T., Škrjanc, M., and Gams, M. (2002). Automatic lexical stress assignment of unknown words for highly inflected Slovenian language. Proceedings of the Fifth International Conference on Text, Speech, Dialogue (TSD 2002). Brno, Czech Republic, pp. 165-172.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., and Hirschberg, J. (1992). TOBI: A standard for labelling English prosody. ICSLP'92 Proceedings, Banff, pp. 867-870.
Škrjanc, M., Šef, T., and Gams, M. (2002). Using decision tree for accentuation in the Slovenian language. STAIRS 2002 Proceedings, STarting Artificial Intelligence Researchers Symposium (Frontiers in Artificial Intelligence and Applications, 78), Lyon, France, pp. 135-144.
Sproat, (Ed.) (1998). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Dordrecht/Boston/London: Kluwer Academic Publishers.
Google Scholar
Srebot Rejec, T. (1988). Word accent and vowel duration in standard Slovene: An acoustic and linguistic investigation. Slawistische Beitr¨age, 226. München: Vewlag Otto Sagner.
Topori?si?, J. (1984). Slovene Grammar. Maribor: Založba Obzorja.
Weilguny, S. (1993). Grapheme-to-phoneme conversion for the synthesis of isolated words. M.Sc. Thesis. Faculty of Electrical Engineering and Computer Science, University of Ljubljana.

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia
Tomaž Šef & Matjaž Gams

Authors

Tomaž Šef
View author publications
You can also search for this author in PubMed Google Scholar
Matjaž Gams
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Šef, T., Gams, M. SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System. International Journal of Speech Technology 6, 277–287 (2003). https://doi.org/10.1023/A:1023470304749

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1023470304749

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human

Speech Synthesis: Text-To-Speech Conversion and Artificial Voices

Speech Synthesis: Text-To-Speech Conversion and Artificial Voices

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

SPEAKER (GOVOREC): A Complete Slovenian Text-to Speech System

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human

Speech Synthesis: Text-To-Speech Conversion and Artificial Voices

Speech Synthesis: Text-To-Speech Conversion and Artificial Voices

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation