Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems

Ekpenyong, Moses; Udoh, EmemObong; Udosen, Escor; Urua, Eno-Abasi

doi:10.1007/978-3-319-08958-4_1

Moses Ekpenyong⁶,
EmemObong Udoh⁷,
Escor Udosen⁸ &
…
Eno-Abasi Urua⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Language and Technology Conference

887 Accesses

Abstract

In this contribution, we document the series of progress towards attaining a generic and replicable system that is applicable not only to Nigerian languages but also other African languages. The current system implements a state-of-the-art approach called the Hidden Markov Model (HMM) approach and aims at a hybridised version which front end components would serve other NLP tasks, as well as future research and developments. We continue to tackle the language specific problems and the ‘unity of purpose’ phenomenon for tone language systems and improve on the speech quality as an extension of our LTC’2011 paper. Specifically, we address issues bordering on tone modelling using syllables as basic synthesis units, with an ‘eye ball’ assessment of the synthesised speech quality. The results of this research offer hope for further improvements, and we envisage an unsupervised system to minimise the labour intensive aspects of the current design. Also, with the active collaboration network established in the course of this research, we are certain that a more robust system that would serve a wide variety of applications will evolve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Masuko, T.: HMM-based speech synthesis and its applications. Ph.D. thesis, Tokyo, Japan (2002)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., and Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM based speech synthesis. In: EUROSPEECH Conference (1999)
Google Scholar
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90–D(1), 325–333 (2007)
Article Google Scholar
Ling, Z.-H., Wu, Y.-J., Wang, Y.-P., Qin, L., Wang, R.H.: USTC system for Blizzard Challenge 2006: an improved HMM based speech synthesis method. In: Blizzard Challenge (2006)
Google Scholar
Black, A., Zen, H., Tokuda, K.: Statistical parametric synthesis. In: ICASSP, Hawaii, pp. 1229–1232 (2007)
Google Scholar
Raitio, T.: Hidden Markov model based finnish text-to-speech system utilizing glottal inverse filtering. M.Sc. thesis, Espoo, Finland (2008)
Google Scholar
Guan, Y., Tian, J., Wu, Y.-J., Yamagishi, J., Nurminen, J.: A unified and automatic approach to Mandarin HTS system. In: 7th ISCA Speech Synthesis Workshop, pp. 1–5 (2010)
Google Scholar
King, S.: A Tutorial on HMM Speech Synthesis (Invited Paper). In: Sadhana - Academy Proceedings in Engineering Sciences, Indian Institute of Sciences (2010)
Google Scholar
Zen, H., Oura, K., Nose T., Yamagishi, J., Sako, S., Toda, T., Masuko, T., Black, A.W., Tokuda, K.: Recent development of the HMM-based speech synthesis system (HTS). In: APSIPA Annual Summit and Conference, Sapporo, Japan, pp. 121–130 (2009)
Google Scholar
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP, pp. 137–140 (1992)
Google Scholar
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Acoustics, Speech, and Signal Processing, vol. 1, pp. 229–232 (1999)
Google Scholar
Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: ICASSP’83, pp. 93–96 (1983)
Google Scholar
Ekpenyong, M., Urua, E.-A., Udosen, E., Udoh, E.: Adaptable phone and syllable HMM-based Ibibio TTS systems. In: Vetulani, Z. (ed.) 5th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 355–360 (2011)
Google Scholar
Essien, O.E.: A Grammar of the Ibibio Language. University Press Limited, Ibadan (1990)
Google Scholar
Simmons, D.: Ibibio verb morphology. Afr. Stud. 16(1), 1–19 (1957)
Article MathSciNet Google Scholar
Urua, E.E.: Aspects of Ibibio phonology and morphology. Ph.D. thesis, Ibadan, Nigeria (1990)
Google Scholar
Akinlabi, A., Urua, E.: Foot structure in Ibibio verb. J. Afr. Lang. Linguist. 23, 119–160 (2002). Walter De Gruyter
Google Scholar
Ekpenyong, M., Udoh, E.O.: Morpho-syntactic analysis framework for tone language text-to-speech systems. Comput. Inf. Sci. 5(4), 83–101 (2012)
Google Scholar
Louw, J.A.: Speect: a multilingual text-to-speech system. In: Proceedings of 19th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Cape Town, pp. 165–168 (2008)
Google Scholar
Ekpenyong, M.E.: Speech synthesis for tone language systems. Ph.D. thesis, University of Uyo, Nigeria (2013)
Google Scholar
Zen, H.: An example context-dependent label format for HMM-based speech synthesis in English. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/F0parametrisation/hts_lab_format.pdf (2006). Accessed 19 May 2011
Ekpenyong, M., Urua E.-A., Watts, O., King, S. and Yamagishi, J.: Statistical parametric speech synthesis for Ibibio, Speech Commun. First online: February 2013. doi:10.1016/j.specom.2013.02.003
Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3), 187–207 (1999)
Article Google Scholar
Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Article Google Scholar
Gibbon, D., Urua, E.-A., Ekpenyong, M.: Problems and solutions in African tone language text-to-speech. In: International Tutorial and Research Workshop on Multilingual Speech and Language Processing, Stellenbosch, paper 14 (2006)
Google Scholar
Ekpenyong, M., Urua, E.-A., Gibbon, D.: Towards an unrestricted domain TTS system for African tone languages. Int. J. Speech Technol. 11, 87–96 (2008)
Article Google Scholar

Download references

Acknowledgments

This research has received support from the following grants: The Local Language Speech Technology Initiative (LLSTI) Industry-University grant, the Science and Technology Education Post-Basic (STEP-B)/World Bank assisted Project grant and the Federal Government of Nigeria (FGN)/Tertiary Education Trust Fund (TETFund) Staff training grant. We also acknowledge Professor Simon King, of the Centre for Speech Technology Research (CSTR), University of Edinburgh, Scotland for accepting to host part of this research in his laboratory.

Author information

Authors and Affiliations

Department of Computer Science, University of Uyo, Uyo, Nigeria
Moses Ekpenyong
Department of Linguistics and Nigerian Languages, University of Uyo, Uyo, Nigeria
EmemObong Udoh & Eno-Abasi Urua
Department of Linguistics and Communication Studies, University of Calabar, Calabar, Nigeria
Escor Udosen

Authors

Moses Ekpenyong
View author publications
You can also search for this author in PubMed Google Scholar
EmemObong Udoh
View author publications
You can also search for this author in PubMed Google Scholar
Escor Udosen
View author publications
You can also search for this author in PubMed Google Scholar
Eno-Abasi Urua
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses Ekpenyong .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
IMMI-CNRS, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekpenyong, M., Udoh, E., Udosen, E., Urua, EA. (2014). Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-08958-4_1
Published: 26 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics