Skip to main content

A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Abstract

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1–10, was obtained for intelligibility and naturalness respectively.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Donovan, R.E.: Trainable Speech Synthesis. Ph.D. thesis, Cambridge University, U.K., Cambridge (1996)

    Google Scholar 

  2. Horne, M.: Prosody: Theory and Experiment: Studies Presented to Gösta Bruce, pp. 450–456. Kluwer, Dordrecht (2000)

    Google Scholar 

  3. Wang, C.: Prosodic modelling for improved speech recognition and understanding. Ph.D. thesis, Massachusetts Institute of Technology (2001)

    Google Scholar 

  4. Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15, 139–153 (1994)

    Article  Google Scholar 

  5. d’Alessandor, C., Mertens, P.: Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 9, 257–288 (1995)

    Article  Google Scholar 

  6. Cheng, Y.C., Lu, S.Y.: Waveform correlation by tree matching. IEEE Trans. On Patt. Anal. & Mach. Intel. PAMI-7, 299–305 (1985)

    Article  Google Scholar 

  7. Ehrich, R.W., Forith, J.: Representation of random waveform by relational trees. IEEE Trans. On Computers C-25, 725–736 (1976)

    Article  Google Scholar 

  8. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. On Syst., Man & Cyber. SMC-1, 116–132 (1985)

    Google Scholar 

  9. Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. Jr. of Speech Tech. 5, 227–235 (2002)

    Article  MATH  Google Scholar 

  10. Ọdẹ́ọbí, O.A., Beaumont, A.J., Wong, S.H.S.: Experiments on stylisation of standard Yorùbá language tones. Technical Report CS-001, Aston University, Birmingham, United Kingdom (2004)

    Google Scholar 

  11. Connell, B., Ladd, D.R.: Aspect of pitch realisation in Yorùbá. Phonology 7, 1–29 (1990)

    Article  Google Scholar 

  12. Harrison, P.: Acquiring the phonology of lexical tone in infants. Lingua 110, 581–616 (2000)

    Article  Google Scholar 

  13. Laniran, Y.O., Clements, G.N.: Downstep and high rising: interacting factors in Yorùbá tone production. J. of Phonetics, 203–250 (2003)

    Google Scholar 

  14. Velle, C.R.L.: An experimental study of Yorùbá tone. Studies in African Linguistics Suppl. 5, 185–194 (1974)

    Google Scholar 

  15. Wang, W.J., Liao, Y.F., Chen, S.H.: RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication 36, 247–265 (2002)

    Article  MATH  Google Scholar 

  16. Monaghan, A.I.C., Ladd, D.R.: Symbolic output as the basis for evaluating intonation in text-tospeech synthesis system. Speech Communication 9, 305–314 (1990)

    Article  Google Scholar 

  17. Boersma, P., Weenink, D.: Praat, doing phonetic by computer (2004), http://www.fon.hum.uva.nl/praat/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Odéjobí, O.A., Beaumont, A.J., Wong, S.H.S. (2004). A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30120-2_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23049-6

  • Online ISBN: 978-3-540-30120-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics