Investigating Signal Correlation as Continuity Metric in a Syllable Based Unit Selection Synthesis System

Rallabandi, Sai Sirisha; Rallabandi, Sai Krishna; Teertha, Naina; R., Kumaraswamy; Gangashetty, Suryakanth V.

doi:10.1007/978-3-319-43958-7_51

Sai Sirisha Rallabandi¹⁶,
Sai Krishna Rallabandi¹⁶,
Naina Teertha¹⁷,
Kumaraswamy R.¹⁷ &
…
Suryakanth V. Gangashetty¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

International Conference on Speech and Computer

2211 Accesses

Abstract

In recent years, text-to-speech (TTS) systems have shown considerable improvement as far as the quality of the synthetic speech is concerned. Data driven synthesis methods using syllable as basic unit for concatenation, have proved to generate high quality speech for Indian Languages because of their advantage of prosodic matching function. However, still there is no acceptable solution to the optimal selection of speech segments in terms of audible discontinuities and human perception. This problem gets aggravated in the cases where there is no enough data for building the voice due to the missing units. In this paper, we continue our efforts in trying to address this by investigating the use of a new continuity measure based on maximum signal correlation for optimal selection of units in concatenative text-to-speech (TTS) synthesis framework. We explore two formulations for calculating the signal correlation: cross correlation (CC) based and average magnitude difference function (AMDF) based. We first perform an initial experiment to understand the significance of the approach and then build 5 experimental systems. Evaluations on 30 sentences for each of these languages by native users of the language show that the proposed continuity measure results in more natural sounding synthesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellur, A., Narayan, K.B., Krishnan, K.R., Murthy, H.A.: Prosody modeling for syllable-based concatenative speech synthesis of hindi and tamil. In: 2011 National Conference on Communications (NCC), pp. 1–5, January 2011
Google Scholar
Bennett, C.L., Black, A.W.: The blizzard challenge 2006. In: Proceedings of the Blizzard Challenge (2006)
Google Scholar
Black, A., Tokuda, K.: The blizzard challenge 2005: evaluating corpus-based speech synthesis on common databases. In: Proceedings of Interspeech (2005)
Google Scholar
Black, A.W., King, S., Tokuda, K.: The blizzard challenge 2009 (2009)
Google Scholar
Black, A.W., Taylor, P.A.: Automatically clustering similar units for unit selection in speech synthesis (1997)
Google Scholar
Clark, R.A., Richmond, K., King, S.: Festival 2-build your own general purpose unit selection speech synthesiser (2004)
Google Scholar
Clark, R.A., Richmond, K., King, S.: Multisyn: open-domain unit selection for the festival speech synthesis system. Speech Commun. 49(4), 317–330 (2007)
Article Google Scholar
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van der Vrecken, O.: The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes. In: Proceedings of the Fourth International Conference on Spoken Language, 1996. ICSLP 1996, vol. 3, pp. 1393–1396. IEEE (1996)
Google Scholar
Elluru, N.K., Vadapalli, A., Elluru, R., Murthy, H., Prahallad, K.: Is word-to-phone mapping better than phone-phone mapping for handling english words? In: ACL (2), pp. 196–200 (2013)
Google Scholar
Fraser, M., King, S.: The blizzard challenge 2007. In: Proceedings of the BLZ3-2007 (in Proceedings SSW6) (2007)
Google Scholar
Hirai, T., Tenpaku, S.: Using 5 ms segments in concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: 1996 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96, vol. 1, pp. 373–376. IEEE (1996)
Google Scholar
King, S., Clark, R.A., Mayo, C., Karaiskos, V.: The blizzard challenge 2008 (2008)
Google Scholar
King, S., Karaiskos, V.: The blizzard challenge 2012 (2012)
Google Scholar
Kishore, S., Black, A.W., Kumar, R., Sangal, R.: Experiments with unit selection speech databases for indian languages. National seminar on Language Technology Tools, Hyderabad, India (2003)
Google Scholar
Kishore, S., Kumar, R., Sangal, R.: A data driven synthesis approach for indian languages using syllable as basic unit. In: Proceedings of International Conference on NLP (ICON), pp. 311–316 (2002)
Google Scholar
Lakkavalli, V.R., Arulmozhi, P., Ramakrishnan, A.G.: Continuity metric for unit selection based text-to-speech synthesis. In: 2010 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5, July 2010
Google Scholar
Murthy, H.A.: Methods for improving the quality of syllable based speech synthesis (2008)
Google Scholar
Ng, K.: Survey of data-driven approaches to speech synthesis. Spoken Language Systems Group, Massachusetts Institute of Technology, Cambridge, MA (1998)
Google Scholar
Peddinti, V., Prahallad, K.: Significance of vowel epenthesis in telugu text-to-speech synthesis. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5348–5351. IEEE (2011)
Google Scholar
Prahallad, K., Toth, A.R., Black, A.W.: Automatic building of synthetic voices from large multi-paragraph speech databases. In: INTERSPEECH, pp. 2901–2904 (2007)
Google Scholar
Prahallad, K., Vadapalli, A., Elluru, N., Mantena, G., Pulugundla, B., Bhaskararao, P., Murthy, H., King, S., Karaiskos, V., Black, A.: The blizzard challenge 2013-indian language task. In: Blizzard Challenge Workshop 2013 (2013)
Google Scholar
Rajaram, B.S.R., Shiva Kumar, H.R., Ramakrishnan, A.: Mile tts for tamil for blizzard challenge 2014. In: 2010 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2010)
Google Scholar
Rallabandi, S.K., Vadapalli, A., Achanta, S., Gangashetty, S.V.: Iiit-h’s entry to blizzard challenge 2015. In: Blizzard Challenge Workshop 2015, Interspeech (2015)
Google Scholar
Rao, K.S., Yegnanarayana, B.: Modeling durations of syllables using neural networks. Comput. Speech Lang. 21(2), 282–295 (2007)
Article Google Scholar
Shiva Kumar, H.R., Ashwini, J.K., Rajaram, B.S.R., Ramakrishnan, A.G.: Mile tts for tamil and kannada for blizzard challenge 2013. In: Blizzard Challenge 2013 Workshop, Barcelona, Catalonia. CMU (2013)
Google Scholar
Tsiakoulis, P., Karabetsos, S., Chalamandaris, A., Raptis, S.: An overview of the ILSP unit selection text-to-speech synthesis system. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 370–383. Springer, Heidelberg (2014)
Chapter Google Scholar
Vinodh, M.V., Bellur, A., Narayan, K.B., Thakare, D.M., Susan, A., Suthakar, N.M., Murthy, H.A.: Using polysyllabic units for text to speech synthesis in indian languages. In: 2010 National Conference on Communications (NCC), pp. 1–5, January 2010
Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Vision Laboratory, International Institute of Information Technology, Hyderabad, India
Sai Sirisha Rallabandi, Sai Krishna Rallabandi & Suryakanth V. Gangashetty
Siddaganga Institute of Technology, Tumkur, Karnataka, India
Naina Teertha & Kumaraswamy R.

Authors

Sai Sirisha Rallabandi
View author publications
You can also search for this author in PubMed Google Scholar
Sai Krishna Rallabandi
View author publications
You can also search for this author in PubMed Google Scholar
Naina Teertha
View author publications
You can also search for this author in PubMed Google Scholar
Kumaraswamy R.
View author publications
You can also search for this author in PubMed Google Scholar
Suryakanth V. Gangashetty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sai Sirisha Rallabandi .

Editor information

Editors and Affiliations

SPIIRAS , Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University , Moscow, Russia
Rodmonga Potapova
Budapest University of Technology and Economics, Budapest, Hungary
Géza Németh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rallabandi, S.S., Rallabandi, S.K., Teertha, N., R., K., Gangashetty, S.V. (2016). Investigating Signal Correlation as Continuity Metric in a Syllable Based Unit Selection Synthesis System. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-43958-7_51
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics