Abstract
In this contribution, we document the series of progress towards attaining a generic and replicable system that is applicable not only to Nigerian languages but also other African languages. The current system implements a state-of-the-art approach called the Hidden Markov Model (HMM) approach and aims at a hybridised version which front end components would serve other NLP tasks, as well as future research and developments. We continue to tackle the language specific problems and the ‘unity of purpose’ phenomenon for tone language systems and improve on the speech quality as an extension of our LTC’2011 paper. Specifically, we address issues bordering on tone modelling using syllables as basic synthesis units, with an ‘eye ball’ assessment of the synthesised speech quality. The results of this research offer hope for further improvements, and we envisage an unsupervised system to minimise the labour intensive aspects of the current design. Also, with the active collaboration network established in the course of this research, we are certain that a more robust system that would serve a wide variety of applications will evolve.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Masuko, T.: HMM-based speech synthesis and its applications. Ph.D. thesis, Tokyo, Japan (2002)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., and Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM based speech synthesis. In: EUROSPEECH Conference (1999)
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90–D(1), 325–333 (2007)
Ling, Z.-H., Wu, Y.-J., Wang, Y.-P., Qin, L., Wang, R.H.: USTC system for Blizzard Challenge 2006: an improved HMM based speech synthesis method. In: Blizzard Challenge (2006)
Black, A., Zen, H., Tokuda, K.: Statistical parametric synthesis. In: ICASSP, Hawaii, pp. 1229–1232 (2007)
Raitio, T.: Hidden Markov model based finnish text-to-speech system utilizing glottal inverse filtering. M.Sc. thesis, Espoo, Finland (2008)
Guan, Y., Tian, J., Wu, Y.-J., Yamagishi, J., Nurminen, J.: A unified and automatic approach to Mandarin HTS system. In: 7th ISCA Speech Synthesis Workshop, pp. 1–5 (2010)
King, S.: A Tutorial on HMM Speech Synthesis (Invited Paper). In: Sadhana - Academy Proceedings in Engineering Sciences, Indian Institute of Sciences (2010)
Zen, H., Oura, K., Nose T., Yamagishi, J., Sako, S., Toda, T., Masuko, T., Black, A.W., Tokuda, K.: Recent development of the HMM-based speech synthesis system (HTS). In: APSIPA Annual Summit and Conference, Sapporo, Japan, pp. 121–130 (2009)
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP, pp. 137–140 (1992)
Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Acoustics, Speech, and Signal Processing, vol. 1, pp. 229–232 (1999)
Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: ICASSP’83, pp. 93–96 (1983)
Ekpenyong, M., Urua, E.-A., Udosen, E., Udoh, E.: Adaptable phone and syllable HMM-based Ibibio TTS systems. In: Vetulani, Z. (ed.) 5th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 355–360 (2011)
Essien, O.E.: A Grammar of the Ibibio Language. University Press Limited, Ibadan (1990)
Simmons, D.: Ibibio verb morphology. Afr. Stud. 16(1), 1–19 (1957)
Urua, E.E.: Aspects of Ibibio phonology and morphology. Ph.D. thesis, Ibadan, Nigeria (1990)
Akinlabi, A., Urua, E.: Foot structure in Ibibio verb. J. Afr. Lang. Linguist. 23, 119–160 (2002). Walter De Gruyter
Ekpenyong, M., Udoh, E.O.: Morpho-syntactic analysis framework for tone language text-to-speech systems. Comput. Inf. Sci. 5(4), 83–101 (2012)
Louw, J.A.: Speect: a multilingual text-to-speech system. In: Proceedings of 19th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Cape Town, pp. 165–168 (2008)
Ekpenyong, M.E.: Speech synthesis for tone language systems. Ph.D. thesis, University of Uyo, Nigeria (2013)
Zen, H.: An example context-dependent label format for HMM-based speech synthesis in English. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/F0parametrisation/hts_lab_format.pdf (2006). Accessed 19 May 2011
Ekpenyong, M., Urua E.-A., Watts, O., King, S. and Yamagishi, J.: Statistical parametric speech synthesis for Ibibio, Speech Commun. First online: February 2013. doi:10.1016/j.specom.2013.02.003
Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3), 187–207 (1999)
Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
Gibbon, D., Urua, E.-A., Ekpenyong, M.: Problems and solutions in African tone language text-to-speech. In: International Tutorial and Research Workshop on Multilingual Speech and Language Processing, Stellenbosch, paper 14 (2006)
Ekpenyong, M., Urua, E.-A., Gibbon, D.: Towards an unrestricted domain TTS system for African tone languages. Int. J. Speech Technol. 11, 87–96 (2008)
Acknowledgments
This research has received support from the following grants: The Local Language Speech Technology Initiative (LLSTI) Industry-University grant, the Science and Technology Education Post-Basic (STEP-B)/World Bank assisted Project grant and the Federal Government of Nigeria (FGN)/Tertiary Education Trust Fund (TETFund) Staff training grant. We also acknowledge Professor Simon King, of the Centre for Speech Technology Research (CSTR), University of Edinburgh, Scotland for accepting to host part of this research in his laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ekpenyong, M., Udoh, E., Udosen, E., Urua, EA. (2014). Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)