Nonlinear Speech Features for the Objective Detection of Discontinuities in Concatenative Speech Synthesis

Pantazis, Yannis; Stylianou, Yannis

doi:10.1007/11520153_21

Yannis Pantazis²² &
Yannis Stylianou²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

International School on Neural Networks, Initiated by IIASS and EMFCSC

1151 Accesses
1 Citations

Abstract

abstr An objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis systems is very important. Previous results showed that linear approaches are not very effective to detect audible discontinuities. The best result was obtained by using the Kullback-Leibler distance on power spectra with the rate of 37%. In this paper, we present two nonlinear approaches for the detection of discontinuities. The first method is based on a nonlinear harmonic model for speech while the second method is based on the demodulation of speech in an amplitude and a frequency component using the Teager energy operator. Results show that detection rate can exceed 70%, which is an improvement of about 95% over previous published results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using large speech database. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 373–376 (1996)
Google Scholar
Campbell, W.N., Black, A.: Prosody and the selection of source units for concatenative synthesis. In: Van Santen, R., Sproat, R., Hirschberg, J., Olive, J. (eds.) Progress in Speech Synthesis, pp. 279–292. Springer, Heidelberg (1996)
Google Scholar
Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A.: The AT&T Next-Gen TTS System. 137th meeting of the Acoustical Society of America (1999), http://www.research.att.com/projects/tts
Coorman, G., Fachrell, J., Rutten, P., Van-Coile, B.: Segment selection in the l&h realspeak laboratory tts system. In: Proc. ICSLP 2000 (2000)
Google Scholar
Klabbers, E., Veldhuis, R.: On the reduction of concatenation artefacts in diphone synthesis. In: International Conference on Spoken Language Processing ICSLP 1998, pp. 1983–1986 (1998)
Google Scholar
Wouters, J., Macon, M.: Perceptual evaluation of distance measures for concatenative speech synthesis. In: International Conference on Spoken Language Processing ICSLP 1998, pp. 2747–2750 (1998)
Google Scholar
Stylianou, Y., Syrdal, A.: Perceptual and objective detection of discontinuities in concatenative speech synthesis. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (2001)
Google Scholar
Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesis. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)
Google Scholar
Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification. PhD thesis, Ecole Nationale Supèrieure des Télécommunications (1996)
Google Scholar
Maragos, P., Kaiser, J., Quatieri, T.: On separating amplitude from frequency modulations using energy operators. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (March 1992)
Google Scholar
Teager, H.M.: Some observations on oral air flow during phonation. IEEE Trans. Acoust., Speech, Signal Processing (October 1980)
Google Scholar
Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanism in the vocal tract. Speech Production and Speech Modelling 55 (July 1990)
Google Scholar
Maragos, P., Quatieri, T.F., Kaiser, J.F.: Speech nonlinearities, modulations and energy operators. In: Proc. IEEE ICASSP 1991 (May 1991)
Google Scholar
Vepa, J., King, S., Taylor, P.: Objective distance measures for spectal discontinuities in concatenative speech synthesis. In: ICSLP 2002, pp. 2605–2608 (2002)
Google Scholar
House, A.S., Williams, C.E., Hecker, M.H.L., Kryter, K.D.: Phycoacoustic speech test: A modified rhyme test. Tech. Doc. Rept. ESD-TDR-63-403 (June 1963)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Crete, 71110, Heraklion, Crete, Greece
Yannis Pantazis & Yannis Stylianou

Authors

Yannis Pantazis
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Stylianou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS LTCI/TSI Paris, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, SA, Italy
Anna Esposito
Escola Universitària Politècnica de Mataró, Universitat Politècnica de Catalunya, Barcelona, Spain
Marcos Faundez-Zanuy
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Via S. Allende, 84081, Baronissi, SA, Italy
Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pantazis, Y., Stylianou, Y. (2005). Nonlinear Speech Features for the Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_21

Download citation

DOI: https://doi.org/10.1007/11520153_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics