Abstract
In this paper we discuss the procedural problems, issues and challenges involved in developing a generic speech synthesizer for African tone languages. We base our development methodology on the “MultiSyn” unit-selection approach, supported by Festival Text-To-Speech (TTS) Toolkit for Ibibio, a Lower Cross subgroup of the (New) Benue-Congo language family widely spoken in the southeastern region of Nigeria. We present in a chronological order, the several levels of infrastructural and linguistic problems as well as challenges identified in the Local Language Speech Technology Initiative (LLSTI) during the development process (from the corpus preparation and refinement stage to the integration and synthesis stage). We provide solutions to most of these challenges and point to possible outlook for further refinement. The evaluation of the initial prototype shows that the synthesis system will be useful to non-literate communities and a wide spectrum of applications.
Similar content being viewed by others
References
Black, A., & Taylor, P. (1997). Festival speech synthesis system: system documentation (1.1.1). Human Communication Research Centre, Technical report. HCRC/TR-83.
Black, A., Taylor, P., & Caley, R. (1999). The festival speech synthesis system. System Documentation (1.4.0), www.cstr.ed.ac.uk/projects/festival/manual/.
Clark, R., Richmond, K., & King, S. (2004). Festival 2: build your own general purpose unit selection speech synthesizer. In 5th ISCA speech synthesis work shop, Pittsburgh, PA (pp. 173–178).
Dutoit, T. (1999). An introduction to text-to-speech synthesis. Berlin: Springer.
Essien, O. (1990). A grammar of the Ibibio language. Ibadan: University Press Limited.
Gibbon, D. (1981). A new look at intonation syntax and semantics. In A. James & P. Westney (Eds.), New linguistics impulses in foreign language teaching. Tübingen: Gunter Narr
Gibbon, D. (1987). Finite state processing of tone systems. In Proceedings of the European chapter of ACL, Copenhagen (pp. 291–297).
Gibbon, D. (2001). Finite state prosodic analysis of African corpus resources. In 7th EUROSPEECH conference, Aalborg, Denmark (pp. 83–86).
Gibbon, D., & Urua, E. (2006). Computational morphotonology in Niger-Congo languages. In Proceedings of speech prosody 2006, Dresden, Germany.
Gibbon, D., Urua, E., & Ekpenyong, M. (2004). Data creation for Ibibio speech synthesis. LLSTI Progress Report, Third Partners Workshop, Lisbon.
Gibbon, D., Urua, E.-A., & Ekpenyong, M. (2006). Problems and solutions in African tone language text-to-speech. In MULTILING 2006 ISCA Tutorial and Research Workshop (ITRW), Stallenbosch, South Africa.
Gut, U., & Gibbon, D. (Eds.) (2002). Typology of African prosodic systems. Bielefeld occasional papers on typology 1. Universitaet Bielefeld, Germany.
Hamza, W., Bakis, R., Shuang, Z., & Zen, H. (2005). On building a concatenative speech synthesis system for blizzard challenge speech databases. In INTERSPEECH 2005, Lisbon.
Hiroya, F. (1988). A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In O. Fugimura (Ed.), Vocal physiology: voice production, mechanisms and functions (pp. 347–355). New York: Raven Press.
Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of ICASSP, 1, Atlanta, Georgia (pp. 373–376).
Kaufman, E. (1985). Ibibio dictionary. Cross River State University and Ibibio Language Board, Nigeria, in cooperation with African Studies Centre, Leiden, The Netherlands.
Klabbers, E., Stoeber, K., Veldhuis, R., & Breuer, S. (2001). Speech synthesis development made easy: the Bonn open synthesis system. In Proceedings of Eurospeech, Aalborg (pp. 521–524).
Martin, J. (1998). A two-level take on Tianjin tone. In G.-J. Kruijff & I. Kruijff-Korbayová (Eds.), Proceedings of the third ESSLLI student session, 10th European summer school on logic, language and information, Saarbruecken, Germany (pp. 162–174).
Mizuno, H., Asano, H., Isoyai, M., Hasebe, M., & Abe, M. (2004). Text-to-speech synthesis technology using corpus-based approach. NTT Technical Review (Vol. 2, No. 3, pp. 70–75).
Olive, J. (1977). Rule synthesis of speech from diadic units. In Proceedings of ICASSP-77 (pp. 568–570).
Pierrehumbert, J. (1980). The phonology and phonetics of English intonation. Diss. Massachusetts Institute of Technology.
Reich, P. (1969). The finiteness of natural language. Language, 45, 831–843.
Schroeter, J. (2006). Text-to-speech (TTS) synthesis. In R. Dorf (Ed.), Circuits, signals and speech and language processing. http://www.research.att.com/~ttsweb/tts/papers/2005_EEHandbook/tts.pdf.
Shalonova, K., & Tucker, R. (2004). Issues in porting TTS to minority languages. In SALTMIL workshop on minority languages, LREC 2004, Lisbon.
Talikdar, P. (2004). Optimal text selection module version 0.2. LLSTI Progress Report, Third Partners Workshop, Lisbon.
Taylor, P., Black, A., & Caley, R. (1998). The architecture of the festival speech synthesis system. In 3rd ESCA workshop on speech synthesis (pp. 147–151), Jenolan Caves, Australia.
‘t Hart, J., & Cohen, A. (1973). Intonation by rule, a perceptual quest. Journal of Phonetics, 1, 309–327.
Tucker, R., & Shalonova, K. (2005). Supporting the creation of TTS for local language voice information systems. In INTERSPEECH-2005 (pp. 453–456).
Urua, E. (2000). Ibibio phonetics and phonology. Cape Town: Centre for Advanced Studies of African Society.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ekpenyong, M.E., Urua, EA. & Gibbon, D. Towards an unrestricted domain TTS system for African tone languages. Int J Speech Technol 11, 87–96 (2008). https://doi.org/10.1007/s10772-009-9037-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-009-9037-5