A Comparison of Two Prosody Modelling Approaches for Sesotho and Serbian

Mohasi, Lehlohonolo; Sečujski, Milan; Mak, Robert; Niesler, Thomas

doi:10.1007/978-3-319-11581-8_4

Lehlohonolo Mohasi²²,
Milan Sečujski²³,
Robert Mak²³ &
…
Thomas Niesler²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1334 Accesses

Abstract

Accurate prediction of prosodic features is one of the critical tasks within a text-to-speech system, especially for under-resourced languages with complex lexical prosody. For synthesized speech to have a natural-sounding intonational contour, an adequate prosodic model should be employed. This study compares the Fujisaki model and the HMM-based prosodic modeling in the context of text-to-speech synthesis, for two quite distant languages with rich prosodic systems: Sesotho, a tonal language from the Bantu family, and Serbian, a South-Slavic language with pitch accent. The results of our experiments suggest that, for both languages, the Fujisaki model outperforms the HMM-based model in the modelling of the intonation contours of utterances of human speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

A Study on Variation of Suprasegmental Phonetic Appearance Considered for Prosody Design with Respect to Assamese Language

Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer

References

Zerbian, S., Barnard, E.: Word-level prosody in Sotho-Tswana. In: Speech Prosody (2010)
Google Scholar
Sečujski, M., Obradović, R., Pekar, D., Jovanov, L., Delić, V.: AlfaNum System for Speech Synthesis in Serbian Language. In: 5th Conf. Text, Speech and Dialogue, pp. 8–16 (2002)
Google Scholar
Mixdorff, H.: A novel approach to the fully automatic extraction of Fujisaki model parameters. In: IEEE Int. Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, pp. 1281–1284 (2000)
Google Scholar
Mixdorff, H., Fujisaki, H., Chen, G., Hu, Y.: Towards the automatic extraction of Fujisaki model parameters for Mandarin. In: Eurospeech/Interspeech 2003, pp. 873–876 (2003)
Google Scholar
Mixdorff, H., Luksaneeyanawin, S., Fujisaki, H.: Perception of tone and vowel quantity in Thai. In: ICSLP (2002)
Google Scholar
Gođevac, S.: Transcribing Serbo-Croatian Intonation. In: S.-A. Jun (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing, pp. 146–171. Oxford Linguistics, UK (2005)
Google Scholar
Sečujski, M., Jakovljević, N., Pekar, D.: Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees. In: Interspeech 2011, pp. 3157–3160 (2011)
Google Scholar
Fujisaki, H., Hirose, K.: Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustics Society of Japan (E) 5(4), 233–241 (1984)
Article Google Scholar
Taylor, P.A.: The Rise/Fall/Connection model of intonation. In: Speech Communication, 15, 169–186 (1995)
Google Scholar
Mixdorff, H., Mehnert, D.: Exploring the Naturalness of Several High-Quality Text-to-Speech Systems. In: Eurospeech 1999, pp. 1859–1862 (1999)
Google Scholar
Aguero, P.D., Wimmer, K., Bonafonte, A.: Automatic analysis and synthesis of Fujisaki intonation model for TTS. In: Speech Prosody (2004)
Google Scholar
Mixdorff, H., Mohasi, L., Machobane, M., Niesler, T.: A study on the perception of tone and intonation in Sesotho. In: Interspeech 2011, pp. 3181–3184 (2011)
Google Scholar
Mohasi, L., Mixdorff, H., Niesler, T.: An acoustic analysis of tone in Sesotho. In: ICPhS XVII, pp. 17–21 (2011)
Google Scholar
Dung, T.N., Luong, C.M., Vu, B.K., Mixdorff, H., Ngo, H.H.: Fujisaki model-based F0 contours in Vietnamese TTS. In: ICSLP (2004)
Google Scholar
Masuko, T.: HMM-Based Speech Synthesis and Its Applications. Ph.D. thesis, Tokyo Institute of Technology, Japan (2002)
Google Scholar
Du Plessis, J.A., et al.: Tweetalige Woordeboek Afrikaans-Suid-Sotho. Via Afrika Bpk, Kaapstad, SA (1974)
Google Scholar
Kriel, T.J., van Wyk, E.B.: Pukuntsu Woordeboek Noord Sotho-Afrikaans, Van Schaik, Pretoria, SA (1989)
Google Scholar
Khoali, B.T.: A Sesotho Tonal Grammar. PhD Thesis. University of Illinois, Urbana-Champaign, USA (1991)
Google Scholar
Sečujski, M., Delić, V.: A Software Tool for Semi-Automatic Part-of-Speech Tagging and Sentence Accentuation in Serbian Language. In: IS-LTC (2006)
Google Scholar
Boersma, P.: Praat - A system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)
Google Scholar
Mixdorff, H.: Intonation Patterns of German - Model-based Quantitative Analysis and Synthesis of F0 Contours. PhD Thesis. TU Dresden, Germany (1998)
Google Scholar
Mixdorff, H.: FujiParaEditor (2012), http://public.beuth-hochschule.de/~mixdorff/thesis/fujisaki.html
Pakoci, E., Mak, R.: HMM-based Speech Synthesis for the Serbian Language. In: ETRAN, Zlatibor, Serbia (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa
Lehlohonolo Mohasi & Thomas Niesler
Faculty of Technical Sciences, University of Novi Sad, Serbia
Milan Sečujski & Robert Mak

Authors

Lehlohonolo Mohasi
View author publications
You can also search for this author in PubMed Google Scholar
Milan Sečujski
View author publications
You can also search for this author in PubMed Google Scholar
Robert Mak
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Niesler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohasi, L., Sečujski, M., Mak, R., Niesler, T. (2014). A Comparison of Two Prosody Modelling Approaches for Sesotho and Serbian. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics