Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language

šef, Tomaž; škrjanc, Maja; Gams, Matjaž

doi:10.1007/3-540-46154-X_23

Tomaž šef³,
Maja škrjanc³ &
Matjaž Gams³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2448))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

580 Accesses

Abstract

This paper presents a two level lexical stress assignment model for out of vocabulary Slovenian words used in our text-to-speech system. First, each vowel (and consonant ’r’) is determined, whether it is stressed or unstressed, and a type of lexical stress is assigned for every stressed vowel (and consonant ’r’). We applied a machine-learning technique (decision trees or boosted decision trees). Then, some corrections are made on the word level, according the number of stressed vowels and the length of the word. For data sets we used the MULTEXT-East Slovene Lexicon, which was supplemented with lexical stress marks. The accuracy achieved by decision trees significantly outperforms all previous results. However, the sizes of the trees indicate that the accentuation in the Slovenian language is a very complex problem and a simple solution in the form of relatively simple rules is not possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Daelemans, W. M. P., van den Bosch A. P. J.: Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion. Progress in Speech Synthesis. Springer (1996) 77–89.
Google Scholar
Sejnowski T. J., Rosenberg C. S.: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987) 145–168.
MATH Google Scholar
Dietterich, T. G., Hild, H., Bakiri, G.: A comparison of ID3 and backpropagation for Eng-lish text-to-speech mapping. Machine Learning 19 (1995) 5–28.
Google Scholar
Black, A., Lenzo K., Pagel V.: Issues in Building General Letter to Sound Rules. 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, (1998) 77–80.
Google Scholar
Busser, B., Daelemans, W., van den Bosch A.: Machine Learning ofWord Pronunciation: The Case Against Abstraction. Proceedings of the Sixth European Conference on Speech Communication and Technology (Eurospeech’99), Budapest, Hungary (1999) 2123–2126.
Google Scholar
Sproat R. (ed.): Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Kluwer Academic Publishers (1998).
Google Scholar
Šef, T.: Analiza besedila v postopku sinteze slovenskega govora (Text Analysis for the Slovenian Text-to-Speech Synthesis system). Ph.D. Thesis, Faculty of Computer and Information Science, University of Ljubljana (2001).
Google Scholar
Erjavec T., Ide N.: The MULTEXT-East Corpus. First International Conference on Language Resources & Evaluation, Granada, Spain, (1998) 28–30.
Google Scholar
Toporišičc J.: Slovenska slovnica (Slovene Grammar), Založba Obzorja, Maribor (1984).
Google Scholar
Quinlan J.R.: Induction of Decision Tress. Machine Learning 1 (1986) 81–106.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Intelligent Systems, Institute Jožef Stefan, Jamova 39, SI-1000, Ljubljana, Slovenia
Tomaž šef, Maja škrjanc & Matjaž Gams

Authors

Tomaž šef
View author publications
You can also search for this author in PubMed Google Scholar
Maja škrjanc
View author publications
You can also search for this author in PubMed Google Scholar
Matjaž Gams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics Department of Programming Systems and Communication, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka
Faculty of Informatics Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Ivan Kopeček & Karel Pala &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

šef, T., škrjanc, M., Gams, M. (2002). Automatic Lexical Stress Assignment of Unknown Words for Highly Inflected Slovenian Language. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2002. Lecture Notes in Computer Science(), vol 2448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46154-X_23

Download citation

DOI: https://doi.org/10.1007/3-540-46154-X_23
Published: 23 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44129-8
Online ISBN: 978-3-540-46154-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics