Skip to main content

Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

  • 887 Accesses

Abstract

In this contribution, we document the series of progress towards attaining a generic and replicable system that is applicable not only to Nigerian languages but also other African languages. The current system implements a state-of-the-art approach called the Hidden Markov Model (HMM) approach and aims at a hybridised version which front end components would serve other NLP tasks, as well as future research and developments. We continue to tackle the language specific problems and the ‘unity of purpose’ phenomenon for tone language systems and improve on the speech quality as an extension of our LTC’2011 paper. Specifically, we address issues bordering on tone modelling using syllables as basic synthesis units, with an ‘eye ball’ assessment of the synthesised speech quality. The results of this research offer hope for further improvements, and we envisage an unsupervised system to minimise the labour intensive aspects of the current design. Also, with the active collaboration network established in the course of this research, we are certain that a more robust system that would serve a wide variety of applications will evolve.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Masuko, T.: HMM-based speech synthesis and its applications. Ph.D. thesis, Tokyo, Japan (2002)

    Google Scholar 

  2. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., and Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM based speech synthesis. In: EUROSPEECH Conference (1999)

    Google Scholar 

  3. Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90–D(1), 325–333 (2007)

    Article  Google Scholar 

  4. Ling, Z.-H., Wu, Y.-J., Wang, Y.-P., Qin, L., Wang, R.H.: USTC system for Blizzard Challenge 2006: an improved HMM based speech synthesis method. In: Blizzard Challenge (2006)

    Google Scholar 

  5. Black, A., Zen, H., Tokuda, K.: Statistical parametric synthesis. In: ICASSP, Hawaii, pp. 1229–1232 (2007)

    Google Scholar 

  6. Raitio, T.: Hidden Markov model based finnish text-to-speech system utilizing glottal inverse filtering. M.Sc. thesis, Espoo, Finland (2008)

    Google Scholar 

  7. Guan, Y., Tian, J., Wu, Y.-J., Yamagishi, J., Nurminen, J.: A unified and automatic approach to Mandarin HTS system. In: 7th ISCA Speech Synthesis Workshop, pp. 1–5 (2010)

    Google Scholar 

  8. King, S.: A Tutorial on HMM Speech Synthesis (Invited Paper). In: Sadhana - Academy Proceedings in Engineering Sciences, Indian Institute of Sciences (2010)

    Google Scholar 

  9. Zen, H., Oura, K., Nose T., Yamagishi, J., Sako, S., Toda, T., Masuko, T., Black, A.W., Tokuda, K.: Recent development of the HMM-based speech synthesis system (HTS). In: APSIPA Annual Summit and Conference, Sapporo, Japan, pp. 121–130 (2009)

    Google Scholar 

  10. Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP, pp. 137–140 (1992)

    Google Scholar 

  11. Tokuda, K., Masuko, T., Miyazaki, N., Kobayashi, T.: Hidden Markov models based on multi-space probability distribution for pitch pattern modeling. In: Acoustics, Speech, and Signal Processing, vol. 1, pp. 229–232 (1999)

    Google Scholar 

  12. Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: ICASSP’83, pp. 93–96 (1983)

    Google Scholar 

  13. Ekpenyong, M., Urua, E.-A., Udosen, E., Udoh, E.: Adaptable phone and syllable HMM-based Ibibio TTS systems. In: Vetulani, Z. (ed.) 5th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 355–360 (2011)

    Google Scholar 

  14. Essien, O.E.: A Grammar of the Ibibio Language. University Press Limited, Ibadan (1990)

    Google Scholar 

  15. Simmons, D.: Ibibio verb morphology. Afr. Stud. 16(1), 1–19 (1957)

    Article  MathSciNet  Google Scholar 

  16. Urua, E.E.: Aspects of Ibibio phonology and morphology. Ph.D. thesis, Ibadan, Nigeria (1990)

    Google Scholar 

  17. Akinlabi, A., Urua, E.: Foot structure in Ibibio verb. J. Afr. Lang. Linguist. 23, 119–160 (2002). Walter De Gruyter

    Google Scholar 

  18. Ekpenyong, M., Udoh, E.O.: Morpho-syntactic analysis framework for tone language text-to-speech systems. Comput. Inf. Sci. 5(4), 83–101 (2012)

    Google Scholar 

  19. Louw, J.A.: Speect: a multilingual text-to-speech system. In: Proceedings of 19th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), Cape Town, pp. 165–168 (2008)

    Google Scholar 

  20. Ekpenyong, M.E.: Speech synthesis for tone language systems. Ph.D. thesis, University of Uyo, Nigeria (2013)

    Google Scholar 

  21. Zen, H.: An example context-dependent label format for HMM-based speech synthesis in English. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/F0parametrisation/hts_lab_format.pdf (2006). Accessed 19 May 2011

  22. Ekpenyong, M., Urua E.-A., Watts, O., King, S. and Yamagishi, J.: Statistical parametric speech synthesis for Ibibio, Speech Commun. First online: February 2013. doi:10.1016/j.specom.2013.02.003

  23. Kawahara, H., Masuda-Katsuse, I., Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27(3), 187–207 (1999)

    Article  Google Scholar 

  24. Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)

    Article  Google Scholar 

  25. Gibbon, D., Urua, E.-A., Ekpenyong, M.: Problems and solutions in African tone language text-to-speech. In: International Tutorial and Research Workshop on Multilingual Speech and Language Processing, Stellenbosch, paper 14 (2006)

    Google Scholar 

  26. Ekpenyong, M., Urua, E.-A., Gibbon, D.: Towards an unrestricted domain TTS system for African tone languages. Int. J. Speech Technol. 11, 87–96 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This research has received support from the following grants: The Local Language Speech Technology Initiative (LLSTI) Industry-University grant, the Science and Technology Education Post-Basic (STEP-B)/World Bank assisted Project grant and the Federal Government of Nigeria (FGN)/Tertiary Education Trust Fund (TETFund) Staff training grant. We also acknowledge Professor Simon King, of the Centre for Speech Technology Research (CSTR), University of Edinburgh, Scotland for accepting to host part of this research in his laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moses Ekpenyong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ekpenyong, M., Udoh, E., Udosen, E., Urua, EA. (2014). Improved Syllable-Based Text to Speech Synthesis for Tone Language Systems. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics