A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis

Kim, Sang-Jin; Jang, Kyung Ae; Han, Hyun Bae; Hahn, Minsoo

doi:10.1007/11589990_57

A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis

Sang-Jin Kim²⁰,
Kyung Ae Jang²¹,
Hyun Bae Han²² &
…
Minsoo Hahn²⁰

Conference paper

1766 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Abstract

Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the Kullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hunt, A.J., Black, A.W.: Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database. In: Proc. IEEE ICASSP, pp. 959–962 (1996)
Google Scholar
Low, P.H., Vaseghi, S.: Synthesis of Unseen Context and Spectral and Pitch Contour Smoothing in Concatenated Text to Speech Synthesis. In: Proc. IEEE ICASSP, pp. 469–472 (2002)
Google Scholar
Chappell, D.T., Hansen, J.H.L.: A Comparison of Spectral Smoothing Methods for Segment Concatenation based Speech Synthesis. Speech Communication 36, 343–374 (2002)
Article MATH Google Scholar
Pfister, B.: High-Quality Prosodic Modification of Speech Signals. In: Proc. ISCLP, pp. 2446–2449 (1996)
Google Scholar
Conkie, A.D., Isard, S.: Optimal Coupling of Diphones. In: Progress in Speech Synthesis, ch. 23, pp. 293–304. Springer, Heidelberg (1997)
Google Scholar
Klabbers, E., Veldhuis, R.: On the Reduction of Concatenation Artifacts in Diphone Synthesis. In: Proc. ICSLP, pp. 1983–1986 (1998)
Google Scholar
Klabbers, E., Veldhuis, R.: Reducing Audible Spectral Discontinuities. IEEE Transactions on Speech and Audio Processing, 39–51 (2001)
Google Scholar
Shin, J.-Y.: Understanding of Korean Speech (printed in Korean), Hankook-Moonwha-sa, Korea (2000)
Google Scholar
Huang, X., Acero, A., Hon, H.: Spoken Lagnuage Processing, pp. 840–842. Prentice-Hall, Englewood Cliffs (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Audio Info. Lab., Information and Communications Univ., Korea
Sang-Jin Kim & Minsoo Hahn
Service Development Lab., KT, Spoken Language Research Team, Korea
Kyung Ae Jang
U-City Planning Department, KT, U-City Planning Center, Korea
Hyun Bae Han

Authors

Sang-Jin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyung Ae Jang
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Bae Han
View author publications
You can also search for this author in PubMed Google Scholar
Minsoo Hahn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Guangxi Normal University, College of CS and IT, Guilin, China, and University of Technology, Faculty of Engineering and Information Technology, Sydney, Australia
Shichao Zhang
Department of Electrical and Computer Systems Engineering, Monash University, 3800, Melbourne, Victoria, Australia
Ray Jarvis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, SJ., Jang, K.A., Han, H.B., Hahn, M. (2005). A New Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_57

Download citation

DOI: https://doi.org/10.1007/11589990_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics