Abstract
Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the Kullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database. In: Proc. IEEE ICASSP, pp. 959–962 (1996)
Low, P.H., Vaseghi, S.: Synthesis of Unseen Context and Spectral and Pitch Contour Smoothing in Concatenated Text to Speech Synthesis. In: Proc. IEEE ICASSP, pp. 469–472 (2002)
Chappell, D.T., Hansen, J.H.L.: A Comparison of Spectral Smoothing Methods for Segment Concatenation based Speech Synthesis. Speech Communication 36, 343–374 (2002)
Pfister, B.: High-Quality Prosodic Modification of Speech Signals. In: Proc. ISCLP, pp. 2446–2449 (1996)
Conkie, A.D., Isard, S.: Optimal Coupling of Diphones. In: Progress in Speech Synthesis, ch. 23, pp. 293–304. Springer, Heidelberg (1997)
Klabbers, E., Veldhuis, R.: On the Reduction of Concatenation Artifacts in Diphone Synthesis. In: Proc. ICSLP, pp. 1983–1986 (1998)
Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Transactions on Speech and Audio Processing, 39–51 (2001)
Shin, J.-Y.: Understanding of Korean Speech (printed in Korean), Hankook-Moonwha-sa, Korea (2000)
Huang, X., Acero, A., Hon, H.: Spoken Lagnuage Processing, pp. 840–842. Prentice-Hall, Englewood Cliffs (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, SJ., Jang, K.A., Han, H.B., Hahn, M. (2005). A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_57
Download citation
DOI: https://doi.org/10.1007/11589990_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)