Abstract
In recent years, text-to-speech (TTS) systems have shown considerable improvement as far as the quality of the synthetic speech is concerned. Data driven synthesis methods using syllable as basic unit for concatenation, have proved to generate high quality speech for Indian Languages because of their advantage of prosodic matching function. However, still there is no acceptable solution to the optimal selection of speech segments in terms of audible discontinuities and human perception. This problem gets aggravated in the cases where there is no enough data for building the voice due to the missing units. In this paper, we continue our efforts in trying to address this by investigating the use of a new continuity measure based on maximum signal correlation for optimal selection of units in concatenative text-to-speech (TTS) synthesis framework. We explore two formulations for calculating the signal correlation: cross correlation (CC) based and average magnitude difference function (AMDF) based. We first perform an initial experiment to understand the significance of the approach and then build 5 experimental systems. Evaluations on 30 sentences for each of these languages by native users of the language show that the proposed continuity measure results in more natural sounding synthesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellur, A., Narayan, K.B., Krishnan, K.R., Murthy, H.A.: Prosody modeling for syllable-based concatenative speech synthesis of hindi and tamil. In: 2011 National Conference on Communications (NCC), pp. 1–5, January 2011
Bennett, C.L., Black, A.W.: The blizzard challenge 2006. In: Proceedings of the Blizzard Challenge (2006)
Black, A., Tokuda, K.: The blizzard challenge 2005: evaluating corpus-based speech synthesis on common databases. In: Proceedings of Interspeech (2005)
Black, A.W., King, S., Tokuda, K.: The blizzard challenge 2009 (2009)
Black, A.W., Taylor, P.A.: Automatically clustering similar units for unit selection in speech synthesis (1997)
Clark, R.A., Richmond, K., King, S.: Festival 2-build your own general purpose unit selection speech synthesiser (2004)
Clark, R.A., Richmond, K., King, S.: Multisyn: open-domain unit selection for the festival speech synthesis system. Speech Commun. 49(4), 317–330 (2007)
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van der Vrecken, O.: The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In: Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 1996, vol. 3, pp. 1393–1396. IEEE (1996)
Elluru, N.K., Vadapalli, A., Elluru, R., Murthy, H., Prahallad, K.: Is word-to-phone mapping better than phone-phone mapping for handling english words? In: ACL (2), pp. 196–200 (2013)
Fraser, M., King, S.: The blizzard challenge 2007. In: Proceedings of the BLZ3-2007 (in Proceedings SSW6) (2007)
Hirai, T., Tenpaku, S.: Using 5 ms segments in concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: 1996 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96, vol. 1, pp. 373–376. IEEE (1996)
King, S., Clark, R.A., Mayo, C., Karaiskos, V.: The blizzard challenge 2008 (2008)
King, S., Karaiskos, V.: The blizzard challenge 2012 (2012)
Kishore, S., Black, A.W., Kumar, R., Sangal, R.: Experiments with unit selection speech databases for indian languages. National seminar on Language Technology Tools, Hyderabad, India (2003)
Kishore, S., Kumar, R., Sangal, R.: A data driven synthesis approach for indian languages using syllable as basic unit. In: Proceedings of International Conference on NLP (ICON), pp. 311–316 (2002)
Lakkavalli, V.R., Arulmozhi, P., Ramakrishnan, A.G.: Continuity metric for unit selection based text-to-speech synthesis. In: 2010 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5, July 2010
Murthy, H.A.: Methods for improving the quality of syllable based speech synthesis (2008)
Ng, K.: Survey of data-driven approaches to speech synthesis. Spoken Language Systems Group, Massachusetts Institute of Technology, Cambridge, MA (1998)
Peddinti, V., Prahallad, K.: Significance of vowel epenthesis in telugu text-to-speech synthesis. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5348–5351. IEEE (2011)
Prahallad, K., Toth, A.R., Black, A.W.: Automatic building of synthetic voices from large multi-paragraph speech databases. In: INTERSPEECH, pp. 2901–2904 (2007)
Prahallad, K., Vadapalli, A., Elluru, N., Mantena, G., Pulugundla, B., Bhaskararao, P., Murthy, H., King, S., Karaiskos, V., Black, A.: The blizzard challenge 2013-indian language task. In: Blizzard Challenge Workshop 2013 (2013)
Rajaram, B.S.R., Shiva Kumar, H.R., Ramakrishnan, A.: Mile tts for tamil for blizzard challenge 2014. In: 2010 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2010)
Rallabandi, S.K., Vadapalli, A., Achanta, S., Gangashetty, S.V.: Iiit-h’s entry to blizzard challenge 2015. In: Blizzard Challenge Workshop 2015, Interspeech (2015)
Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Comput. Speech Lang. 21(2), 282–295 (2007)
Shiva Kumar, H.R., Ashwini, J.K., Rajaram, B.S.R., Ramakrishnan, A.G.: Mile tts for tamil and kannada for blizzard challenge 2013. In: Blizzard Challenge 2013 Workshop, Barcelona, Catalonia. CMU (2013)
Tsiakoulis, P., Karabetsos, S., Chalamandaris, A., Raptis, S.: An overview of the ILSP unit selection text-to-speech synthesis system. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 370–383. Springer, Heidelberg (2014)
Vinodh, M.V., Bellur, A., Narayan, K.B., Thakare, D.M., Susan, A., Suthakar, N.M., Murthy, H.A.: Using polysyllabic units for text to speech synthesis in indian languages. In: 2010 National Conference on Communications (NCC), pp. 1–5, January 2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Rallabandi, S.S., Rallabandi, S.K., Teertha, N., R., K., Gangashetty, S.V. (2016). Investigating Signal Correlation as Continuity Metric in a Syllable Based Unit Selection Synthesis System. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)