On the Detection of Discontinuities in Concatenative Speech Synthesis

Pantazis, Yannis; Stylianou, Yannis

doi:10.1007/978-3-540-71505-4_6

On the Detection of Discontinuities in Concatenative Speech Synthesis

Yannis Pantazis¹ &
Yannis Stylianou¹

Chapter

1139 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets most of them of spectral nature and distances were tested. However there were significant discrepancies among the results. In this paper, we tested most of the distances that were proposed using the same listening experiment. Best score were given by AM&FM decomposition of the speech signal using Fisher’s linear discriminant.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Donovan, R.E.: Trainable Speech Synthesis. PhD thesis, Cambridge University, Engineering Department (1996)
Google Scholar
Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using large speech database. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 373–376 (1996)
Google Scholar
Campbell, W.N., Black, A.: Prosody and the selection of source units for concatenative synthesis. In: Van Santen, R., Sproat, R., Hirschberg, J., Olive, J. (eds.) Progress in Speech Synthesis, pp. 279–292. Springer, Heidelberg (1996)
Google Scholar
Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A.: The AT&T Next-Gen TTS System. In: 137th meeting of the Acoustical Society of America (1999), http://www.research.att.com/projects/tts
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht (1997)
Google Scholar
Barnwell, T.R., Quackenbush, S.R., Clements, M.A.: Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs (1988)
Google Scholar
Wouters, J., Macon, M.: Perceptual evaluation of distance measures for concatenative speech synthesis. In: International Conference on Spoken Language Processing, ICSLP 98, pp. 2747–2750 (1998)
Google Scholar
Chen, J.-D., Campbell, N.: Objective distance measures for assessing concatenative speech synthesis. In: EuroSpeech99, pp. 611–614 (1999)
Google Scholar
Bellegarda, J.R.: A novel discontinuity metric for unit selection text-to-speech synthesis. In: 5th ISCA Speech Synthesis Worksop, pp. 133–138. Pittsburgh (2004)
Google Scholar
Klabbers, E., Veldhuis, R.: On the reduction of concatenation artefacts in diphone synthesis. In: International Conference on Spoken Language Processing, ICSLP 98, pp. 1983–1986 (1998)
Google Scholar
Stylianou, Y., Syrdal, A.: Perceptual and objective detection of discontinuities in concatenative speech synthesis. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (2001)
Google Scholar
Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesis. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)
Google Scholar
Vepa, J., King, S., Taylor, P.: Objective distance measures for spectal discontinuities in concatenative speech synthesis. In: ICSLP 2002, pp. 2605–2608 (2002)
Google Scholar
Soong, F.K., Juang, B.H.: Line spectrum pairs and speech data compression. In: ICCASP, pp. 1.10.1–1.10.4 (1984)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Teager, H.M.: Some observations on oral air flow during phonation. IEEE Trans. Acoust., Speech, Signal Processing (Oct. 1980)
Google Scholar
Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanism in the vocal tract. Speech Production and Speech Modelling 55 (Jul. 1990)
Google Scholar
Pantazis, Y., Stylainou, Y., Klabbers, E.: Discontinuity detection in concatenated speech synthesis based on nonlinear analysis. In: InterSpeech2005, pp. 2817–2820 (2005)
Google Scholar
Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification. PhD thesis, Ecole Nationale Supérieure des Télécommunications (1996)
Google Scholar
Maragos, P., Kaiser, J., Quatieri, T.: On separating amplitude from frequency modulations using energy operators. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Mar. 1992)
Google Scholar
Kawai, H., Tsuzaki, M.: Acoustic measures vs. phonetic measures as predictors of audible discontinuity in concatenative synthesis. In: ICSLP (2002)
Google Scholar
Syrdal, A.K., Conkie, A.D.: Data-driven perceptually based join cost. In: 5th ISCA Speech Synthesis Workshop, pp. 49–54 (2004)
Google Scholar
Wouters, J., Macon, M.W.: Unit fusion for concatenative speech synthesis. In: ICSLP (Oct. 2000)
Google Scholar
Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9, 39–51 (2001)
Article Google Scholar
Vepa, J., Taylor, S.: Kalman-filter based join cost for unit selection speech synthesis. In: Eurospeech (Sep. 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Crete, Computer Science Department, Heraklion Crete,71409, Greece
Yannis Pantazis & Yannis Stylianou

Authors

Yannis Pantazis
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Stylianou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pantazis, Y., Stylianou, Y. (2007). On the Detection of Discontinuities in Concatenative Speech Synthesis. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-71505-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics