Skip to main content
Log in

Labeling of Symbolic Prosody Breaks for the Slovenian Language

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents the data-driven prediction of word level prosody breaks modelling for the Slovenian language. Automatic learning techniques depend on the construction of a large corpus labeled appropriately. This labeling can be done either automatically, or by hand. While automatic labeling can be less accurate than hand labeling, the latter is very time consuming and, in some cases, inconsistent. Therefore, a new interactive tool for word level prosody labeling (major/minor breaks) is presented together with a new semi-automatic approach for determining prosody breaks. This interactive tool combines the advantages of hand labeling and automatic labeling by achieving a high consistency in labeling and reducing the time needed for hand labeling. The labeled Slovenian corpus has been used to train our phrase break prediction module, implementing a neural network (NN) structure. Experiments for the data-driven prediction of major = minor and major/minor phrase breaks were performed. The prediction accuracy achieved marks state-of-the-art word level prosody breaks prediction for the Slovenian language and is comparable to the prediction accuracy of other approaches in which more complex NN structures (Müller et al., 2000) or other prediction methods (Black and Tailor, 1997) were applied, and a much larger corpus was used for training. The overall prediction accuracy achieved is 94% for major = minor breaks and over 98/92% for major/minor phrase breaks, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bavarian Archive of Speech Signals. SI1000 (1998). Prosodic Markers Version 1.0 University of Munich, Institute of Phonetics. Munich, Germany.

  • Black, A.W. and Taylor, P. (1997). Assigning phrase breaks from part-of-speech sequences. Proceedings Eurospeech 97, Rhodes, Greece. pp. 995-998.

  • Benzmüller, R. and Grice, M. (1999). Describing German intonation with GtoBI. Workshop on Intonation: Models on ToBI Labeling. San Francisco.

  • Fackrell, J.W.A., Vereecken H., Martens J.-P., and Van Coile B. (1999). Multilingual prosody modelling using cascades of regression trees and neuronal networks. Proceedings Eurospeech 99, Budapest, Hungary, vol. 4, pp. 1835-1838.

    Google Scholar 

  • Hain, H.-U. (1999). Automation of the training procedure for neural networks performing multilingual grapheme to phoneme conversion. Proceedings Eurospeech 99, Budapest, Hungary, vol. 5, pp. 2087-2090.

    Google Scholar 

  • Hozjan, V. and Stergar, J. (2001). Determination of prominence accent of prosodic segments in emotional speech. Advances in Speech Technology, Maribor, Slovenia (to appear in Proceedings).

  • Kompe, R. (1997). Prosody in Speech Understanding Systems. Springer-Verlag Berlin Heidelberg, Lecture Notes in Artificial Inteligence.

  • Malfrere, F., Dutoit T., and Mertens, P. (1998). Fully automatic prosody generator for text-to-speech. ICSLP 98, Sydney, Australia, pp. 1395-1398.

  • Mihelič, F., Gros, J., Nöth, E., Dobrišek, S., and Žibert, J. (2000). Recognition of selected prosodic events in Slovenian speech, Language Technologies, Ljubljana, Slovenia. pp. 45-48.

  • Müller, A.F., Zimmermann, H.G., and Neuneier, R. (2000). Robust generation of symbolic prosody by a neural classifier based on autoassociators. Proceedings ICASSP 00, Istanbul, Turkey, vol. 3, pp. 1285-1288.

    Google Scholar 

  • Qian, X. and Kimaresan, R. (1996). A variable frame pitch estimator and test results. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, Atlanta, GA, pp. 228-231.

    Google Scholar 

  • Neuneier, R. and Zimmermann, H.G. (1998). How to train neural networks. In G.B. Orr, K.-R. Müller (Eds.), Tricks of the trade, Springer Verlag, pp. 395-423.

  • Rojc, M. and Kačič, Z. (2000). Design of optimal Slovenian speech corpus for use in the concatenative speech synthesis system,LREC 00, Athens, Greece, pp. 321-325.

  • Senn Version 3.0 User Manual. SIEMENS AG. 1998.

  • Stergar, J. (2000). Determining symbolic prosody features with analysis of speech corpora. Master Thesis. University of Maribor. Faculty for EE. and Comp. Sci.

  • Stergar, J. and Hozjan, V. (2000). Steps towards preparation of text corpora for data driven symbolic prosody labeling. In T. Erjavec and J. Gros (Eds.), Language Technologies: Proceedings of the Conference, Ljubljana, Slovenia, pp. 82-85.

  • Terken, J. and Collier,R. (1998). The generation of prosodic structure and intonation in speech synthesis. In W.B. Kleijn et al. (Eds.), Speech Coding and Synthesis, Elsevier, pp. 635-662.

  • Toporišič, J. (1991). Slovenska slovnica. Založba obzorja Maribor. Slovenija.

  • Vereecken, H., Martens, J.P., Grover, C., Fackrell, J., and Van Coile, B. (1998). Automatic prosodic labeling of 6 languages. ICSLP 98, Sydney, Australia.

  • Vereecken, H., Vorstermans, A., Martens, J.-P., and Van Coile, B. (1997). Improving the phonetic annotation by means of prosodic phrasing. Proceedings Eurospeech 97, Rhodes, Greece, vol. 1, pp. 179-182.

    Google Scholar 

  • Widera, C., Portele, T., and Wolters, M. (1997). Prediction of word prominence. Proceedings Eurospeech 97, Rhodes, Greece, vol. 2, pp. 999-1002.

    Google Scholar 

Web Reference

  • Institut für Phonetik und sprachliche KommuniKačičn: Siemens Synthese Korpus-SI1000P, http://www.phonetik.uni-muenchen. de/Bas/.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stergar, J., Hozjan, V. & Horvat, B. Labeling of Symbolic Prosody Breaks for the Slovenian Language. International Journal of Speech Technology 6, 289–299 (2003). https://doi.org/10.1023/A:1023422421588

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023422421588

Navigation