Abstract
In this paper, we present the results of a clustering experiment, the aim of which was to show whether or not the proximity of pitch contours is sufficient condition for perceptually smooth transitions at concatenation points in concatenative speech synthesis. The experiment was motivated by a previous finding which had shown that the support vector machine (SVM) classifiers are capable of separating with a high accuracy perceptually continuous and discontinuous joins using the pitch contours extracted from the vicinity of concatenation points as predictors. The experiment has shown that clustering of observations in a form of pitch contours represented in different scales using the euclidean distance as a metric does not prove to be a reliable way of identifying discontinuities at concatenation points.
This research was supported by the Grant Agency of the Czech Republic, project No. GAČR 102/09/0989; by the Technology Agency of the Czech Republic, project No. TA01030476. The work has also been supported by the grant of the University of West Bohemia, project No. SGS-2010-054.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP 1996, Atlanta, Georgia, vol. 1, pp. 373–376 (1996)
Dutoit, T.: Corpus–based speech synthesis. In: Benesty, J., Sondhi, M.M., Huang, Y. (eds.) Springer Handbook of Speech Processing. ch. 21, pp. 437–455. Springer, Heidelberg (2008)
Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9, 39–51 (2001)
Bellegarda, J.R.: A novel discontinuity metric for unit selection text–to–speech synthesis. In: SSW5 2004, Pittsburgh, PA, USA, pp. 133–138 (2004)
Vepa, J.: Join cost for unit selection speech synthesis. Ph.D. thesis, University of Edinburgh (2004)
Legát, M., Matoušek, J.: Design of the test stimuli for the evaluation of concatenation cost functions. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 339–346. Springer, Heidelberg (2009)
Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)
Legát, M., Matoušek, J.: Collection and analysis of data for evaluation of concatenation cost functions. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 345–352. Springer, Heidelberg (2010)
Legát, M., Matoušek, J.: Analysis of data collected in listening tests for the purpose of evaluation of concatenation cost functions. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS(LNAI), vol. 6836, pp. 33–40. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Legát, M., Matoušek, J. (2011). Identifying Concatenation Discontinuities by Hierarchical Divisive Clustering of Pitch Contours. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)