Abstract
This paper proposes a new method of concatenation cost calculation for enhancing the optimality in unit selection. Instead of defining same set of concatenation costs for all types of speech unit transitions, costs are defined based on the type of unit transitions. Different types of unit transitions that can occur mainly in an utterance are voiced to voiced, voiced to unvoiced and unvoiced to unvoiced transitions. Natural measure of continuity is identified for each of these transitions, and costs are defined accordingly. For voiced to voiced transitions, in addition to spectral continuity, pitch and energy continuity metrics are proposed. In case of voiced to unvoiced and unvoiced to unvoiced transitions, silence duration embedded in the unvoiced region is proposed as the continuity metric. This approach of segment specific concatenation cost calculation improves the quality of syllable based text to speech synthesis. Listening tests provide a proof on the effectiveness of proposed methodology which has clearly shown the decrease in perceptual discontinuity at joins, and improvement in the overall quality of the synthesised speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 1, pp. 373–376 (1996)
Black, A.W., Taylor, P.: Automatically clustering similar units for unit selection in speech synthesis. In: Eurospeech 1997, vol. 2, pp. 601–604 (1997)
Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., Raptis, S.: One-class classification for spectral join cost calculation in unit selection speech synthesis. IEEE Signal Processing Letters 17(8), 746–749 (2010)
Vepa, J., King, S.: Join cost for unit selection speech synthesis, pp. 35–62. Prentice-Hall, NJ (2004)
Dong, M., Lua, K.T., Li, H.: Unit selection-based speech synthesis approach for mandarian chinese. Journal of Chinese Language and Computing, 135–144 (2006)
Blouin, C., Rosec, O., Bagshaw, P.C., d’Alessandro, C.: Concatenation Cost Calculation and Optimization for Unit Selection in TTS. In: IEEEWorkshop on Speech Synthesis, SantaMonica CA, USA (2002)
Conkie, A., Isard, S.: Progress in speech synthesis. Progress in speech synthesis (1997)
Benesty, J., Sondhi, M.M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Narendra, N.P., Rao, K.S. (2011). Segment Specific Concatenation Cost for Syllable Based Bengali TTS. In: Aluru, S., et al. Contemporary Computing. IC3 2011. Communications in Computer and Information Science, vol 168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22606-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-22606-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22605-2
Online ISBN: 978-3-642-22606-9
eBook Packages: Computer ScienceComputer Science (R0)