A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

Odéjobí, Odétúnjí A.; Beaumont, Anthony J.; Wong, Shun Ha Sylvia

doi:10.1007/978-3-540-30120-2_52

A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

Odétúnjí A. Odéjobí²¹,
Anthony J. Beaumont²¹ &
Shun Ha Sylvia Wong²¹

Conference paper

889 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Abstract

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1–10, was obtained for intelligibility and naturalness respectively.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Donovan, R.E.: Trainable Speech Synthesis. Ph.D. thesis, Cambridge University, U.K., Cambridge (1996)
Google Scholar
Horne, M.: Prosody: Theory and Experiment: Studies Presented to Gösta Bruce, pp. 450–456. Kluwer, Dordrecht (2000)
Google Scholar
Wang, C.: Prosodic modelling for improved speech recognition and understanding. Ph.D. thesis, Massachusetts Institute of Technology (2001)
Google Scholar
Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15, 139–153 (1994)
Article Google Scholar
d’Alessandor, C., Mertens, P.: Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 9, 257–288 (1995)
Article Google Scholar
Cheng, Y.C., Lu, S.Y.: Waveform correlation by tree matching. IEEE Trans. On Patt. Anal. & Mach. Intel. PAMI-7, 299–305 (1985)
Article Google Scholar
Ehrich, R.W., Forith, J.: Representation of random waveform by relational trees. IEEE Trans. On Computers C-25, 725–736 (1976)
Article Google Scholar
Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. On Syst., Man & Cyber. SMC-1, 116–132 (1985)
Google Scholar
Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. Jr. of Speech Tech. 5, 227–235 (2002)
Article MATH Google Scholar
Ọdẹ́ọbí, O.A., Beaumont, A.J., Wong, S.H.S.: Experiments on stylisation of standard Yorùbá language tones. Technical Report CS-001, Aston University, Birmingham, United Kingdom (2004)
Google Scholar
Connell, B., Ladd, D.R.: Aspect of pitch realisation in Yorùbá. Phonology 7, 1–29 (1990)
Article Google Scholar
Harrison, P.: Acquiring the phonology of lexical tone in infants. Lingua 110, 581–616 (2000)
Article Google Scholar
Laniran, Y.O., Clements, G.N.: Downstep and high rising: interacting factors in Yorùbá tone production. J. of Phonetics, 203–250 (2003)
Google Scholar
Velle, C.R.L.: An experimental study of Yorùbá tone. Studies in African Linguistics Suppl. 5, 185–194 (1974)
Google Scholar
Wang, W.J., Liao, Y.F., Chen, S.H.: RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication 36, 247–265 (2002)
Article MATH Google Scholar
Monaghan, A.I.C., Ladd, D.R.: Symbolic output as the basis for evaluating intonation in text-tospeech synthesis system. Speech Communication 9, 305–314 (1990)
Article Google Scholar
Boersma, P., Weenink, D.: Praat, doing phonetic by computer (2004), http://www.fon.hum.uva.nl/praat/

Download references

Author information

Authors and Affiliations

Computer Science, Aston University, Aston Triangle, Birmingham, B4 7ET, United Kingdom
Odétúnjí A. Odéjobí, Anthony J. Beaumont & Shun Ha Sylvia Wong

Authors

Odétúnjí A. Odéjobí
View author publications
You can also search for this author in PubMed Google Scholar
Anthony J. Beaumont
View author publications
You can also search for this author in PubMed Google Scholar
Shun Ha Sylvia Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Odéjobí, O.A., Beaumont, A.J., Wong, S.H.S. (2004). A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-30120-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics