Abstract
This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant ’r’) is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant ’r’). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.
Preview
Unable to display preview. Download preview PDF.
References
Daelemans, W. M. P., van den Bosch A. P. J.: Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion. Progress in Speech Synthesis. Springer (1996) 77–89.
Sejnowski T. J., Rosenberg C. S.: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987) 145–168.
Dietterich, T. G., Hild, H., Bakiri, G.: A comparison of ID3 and backpropagation for Eng-lish text-to-speech mapping. Machine Learning 19 (1995) 5–28.
Black, A., Lenzo K., Pagel V.: Issues in Building General Letter to Sound Rules. 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, (1998) 77–80.
Busser, B., Daelemans, W., van den Bosch A.: Machine Learning ofWord Pronunciation: The Case Against Abstraction. Proceedings of the Sixth European Conference on Speech Communication and Technology (Eurospeech’99), Budapest, Hungary (1999) 2123–2126.
Sproat R. (ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers (1998).
Šef, T.: Analiza besedila v postopku sinteze slovenskega govora (Text Analysis for the Slovenian Text-to-Speech Synthesis system). Ph.D. Thesis, Faculty of Computer and Information Science, University of Ljubljana (2001).
Erjavec T., Ide N.: The MULTEXT-East Corpus. First International Conference on Language Resources & Evaluation, Granada, Spain, (1998) 28–30.
Toporišičc J.: Slovenska slovnica (Slovene Grammar), Založba Obzorja, Maribor (1984).
Quinlan J.R.: Induction of Decision Tress. Machine Learning 1 (1986) 81–106.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
šef, T., škrjanc, M., Gams, M. (2002). Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_23
Download citation
DOI: https://doi.org/10.1007/3-540-46154-X_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44129-8
Online ISBN: 978-3-540-46154-8
eBook Packages: Springer Book Archive