Skip to main content

Advertisement

Log in

Development of syllable-based text to speech synthesis system in Bengali

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents the design and development of unrestricted text to speech synthesis (TTS) system in Bengali language. Unrestricted TTS system is capable to synthesize good quality of speech in different domains. In this work, syllables are used as basic units for synthesis. Festival framework has been used for building the TTS system. Speech collected from a female artist is used as speech corpus. Initially five speakers’ speech is collected and a prototype TTS is built from each of the five speakers. Best speaker among the five is selected through subjective and objective evaluation of natural and synthesized waveforms. Then development of unrestricted TTS is carried out by addressing the issues involved at each stage to produce good quality synthesizer. Evaluation is carried out in four stages by conducting objective and subjective listening tests on synthesized speech. At the first stage, TTS system is built with basic festival framework. In the following stages, additional features are incorporated into the system and quality of synthesis is evaluated. The subjective and objective measures indicate that the proposed features and methods have improved the quality of the synthesized speech from stage-2 to stage-4.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Basu, J. B., Mitra, T., Mandal, M., & Das, S. K. (2009). Grapheme to phoneme (g2p) conversion for Bangla. In Oriental COCOSDA international conference on speech database and assessments.

    Google Scholar 

  • Benesty, J., Sondhi, M. M., & Huang, Y. (2008). Springer handbook of speech processing. Springer, Berlin

    Book  Google Scholar 

  • Beutnagel, M., Conkie, A., & Syrdal, A. (1998). Diphone synthesis using unit selection. In 3rd ESCA/COCOSDA workshop on speech synthesis, Nov.

    Google Scholar 

  • Beutnagel, M., Mohri, M., & Riley, M. (1999). Rapid unit selection from a large speech corpus for concatenative speech synthesis. In Proc. Eurospeech.

    Google Scholar 

  • Black, A. W., & Lanzo, K. (2003). Building synthetic voices. Cambridge: Carnegie Mellon University.

    Google Scholar 

  • Black, A. W., & Lenzo, K. A. (2000). Limited domain synthesis. In ICSLP, Beijing, China.

    Google Scholar 

  • Black, A. W., & Taylor, P. (1994). Chatr: a generic speech sythesis system. In COLING ’94 (pp. 983–986).

    Google Scholar 

  • Black, A. W., & Taylor, P. (1997). Automatically clustering similar units for unit selection in speech synthesis. In Eurospeech’97 (vol. 2, pp. 601–604).

    Google Scholar 

  • Blouin, C., Rosec, O., Bagshaw, P., & d’Alessandro, C. (2002). Concatenation cost calculation and optimization for unit selection in tts. In IEEE workshop on speech synthesis, Santa Monica, CA, USA.

    Google Scholar 

  • Bozkurt, B., Ozturk, O., & Dutoit, T. (2003). Text design for tts speech corpus building using a modified greedy selection. In 8th European conference on speech communication and technology (Eurospeech), Geneva, Switzerland, September (pp. 277–280).

    Google Scholar 

  • Chitturi, R., Mariam, S. H., & Kumar, R. (2005). Rapid methods for optimal text selection. In Recent advances in natural language processing, Borovets, Bulgaria, September.

    Google Scholar 

  • Choudhury, M. (2003). Rule-based grapheme to phoneme mapping for Hindi speech synthesis. In 9th Indian science congress of the international speech communication association (ISCA), Bangalore.

    Google Scholar 

  • Conkie, A., & Isard, S. (1997). Progress in speech synthesis. In Progress in speech synthesis. New York: Springer.

    Google Scholar 

  • Deivapalan, P. G., Jha, M., Guttikonda, R., & Murthy, H. A. (2008). Donlabel: an automatic labeling tool for Indian languages. In National conference on communication (NCC), IIT-Bombay, February (pp. 263–266).

    Google Scholar 

  • Dong, M., teng Lua, K., & Li, H. (2008). A unit selection-based speech synthesis approach for mandarin Chinese. Journal of Chinese Language and Computing, 16, 135–144.

    Google Scholar 

  • Ghosh, K., Reddy, R. V., Narendra, N. P., Maity, S., Koolagudi, S. G., & Rao, K. S. (2010). Grapheme to phoneme conversion in Bengali for festival based tts framework. In 8th international conference on natural language processing (ICON). Macmillan Publishers, New Delhi.

    Google Scholar 

  • Gros, J. Z., & Zganec, M. (2008). An efficient unit-selection method for concatenative text-to-speech synthesis systems. Journal of Computing and Information Technology, 1, 69–78.

    Google Scholar 

  • Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of IEEE int. conf. acoust., speech, and signal processing (vol. 1, pp. 373–376).

    Google Scholar 

  • Kaira, S. (1976). Schwa-deletion in Hindi. In Bhari publications: Vol. 2. Language forum.

    Google Scholar 

  • Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., & Raptis, S. (2010) One-class classification for spectral join cost calculation in unit selection speech synthesis, IEEE Signal Processing Letters 17.

  • Kishore, S., & Black, A. W. (2003). Unit size in unit selection speech synthesis. In EUROSPEECH (pp. 1317–1320).

    Google Scholar 

  • Kishore, S. P., Sangal, R., & Srinivas, M. (2002). Building Hindi and Telugu voices using festvox. In ICON, Mumbai, India, December.

    Google Scholar 

  • Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82, 737–793.

    Article  Google Scholar 

  • Krishna, N. S., & Murthy, H. A. (2004). Duration modeling of Indian languages Hindi and Telugu. In Proceedings of 5 th ISCA SSW.

    Google Scholar 

  • Krishna, N. S., Talukdar, P. P., Bali, K., & Ramakrishnan, A. (2004). Duration modeling for Hindi text-to-speech synthesis system. In ICSLP 2004 (pp. 789–792).

    Google Scholar 

  • Lawrence, W. (1953). The synthesis of speech from signals which have a low information rate. London: Butterworths.

    Google Scholar 

  • Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Raghavendra, E., & Prahallad, K. (2010). A multilingual screen reader in Indian languages. In National conference on communications (NCC), Chennai, India, January.

    Google Scholar 

  • Rao, M. N., Thomas, S., Nagarajan, T., & Murthy, H. A. (2005). Text-to-speech synthesis using syllable like units. In National conference on communication, IIT Kharagpur, India, January (pp. 227–280).

    Google Scholar 

  • Riley, M. (1992). Tree-based modeling for speech synthesis. In G. Bailly, C. Benoit, & T. Sawallis (Eds.), Talking machines: theories, models and designs (pp. 265–273).

    Google Scholar 

  • Sreekanth, M., & Ramakrishnan, A. G. (2007). Festival based maiden tts system for Tamil language. In Proc. 3rd language and technology conf., Poznan, Poland, October (pp. 187–191).

    Google Scholar 

  • Tahar, S., Mounir, Z., & Mohamed, B. A. (2005). Arabic speech synthesis using a concatenation of polyphones: the results. In Lecture notes in computer science: Vol. 3501. Advances in artificial intelligence (pp. 406–411).

    Chapter  Google Scholar 

  • van Santen, J. P. H., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In Eurospeech, Rhodes, Greece (pp. 553–556).

    Google Scholar 

  • Vepa, J., & King, S. (2004). Join cost for unit selection speech synthesis. In Text to speech synthesis: new paradigms and advances. New York: Prentice Hall (pp. 35–62).

    Google Scholar 

  • Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., & Kitamura, T. (1999). Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis. In Proc. Eurospeech (pp. 2347–2350).

    Google Scholar 

  • Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51, 1039–1064.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. P. Narendra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narendra, N.P., Rao, K.S., Ghosh, K. et al. Development of syllable-based text to speech synthesis system in Bengali. Int J Speech Technol 14, 167–181 (2011). https://doi.org/10.1007/s10772-011-9094-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9094-4

Keywords

Navigation