Skip to main content

Nonlinear Speech Features for the Objective Detection of Discontinuities in Concatenative Speech Synthesis

  • Conference paper
Nonlinear Speech Modeling and Applications (NN 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3445))

Included in the following conference series:

Abstract

abstr An objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis systems is very important. Previous results showed that linear approaches are not very effective to detect audible discontinuities. The best result was obtained by using the Kullback-Leibler distance on power spectra with the rate of 37%. In this paper, we present two nonlinear approaches for the detection of discontinuities. The first method is based on a nonlinear harmonic model for speech while the second method is based on the demodulation of speech in an amplitude and a frequency component using the Teager energy operator. Results show that detection rate can exceed 70%, which is an improvement of about 95% over previous published results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using large speech database. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 373–376 (1996)

    Google Scholar 

  2. Campbell, W.N., Black, A.: Prosody and the selection of source units for concatenative synthesis. In: Van Santen, R., Sproat, R., Hirschberg, J., Olive, J. (eds.) Progress in Speech Synthesis, pp. 279–292. Springer, Heidelberg (1996)

    Google Scholar 

  3. Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A.: The AT&T Next-Gen TTS System. 137th meeting of the Acoustical Society of America (1999), http://www.research.att.com/projects/tts

  4. Coorman, G., Fachrell, J., Rutten, P., Van-Coile, B.: Segment selection in the l&h realspeak laboratory tts system. In: Proc. ICSLP 2000 (2000)

    Google Scholar 

  5. Klabbers, E., Veldhuis, R.: On the reduction of concatenation artefacts in diphone synthesis. In: International Conference on Spoken Language Processing ICSLP 1998, pp. 1983–1986 (1998)

    Google Scholar 

  6. Wouters, J., Macon, M.: Perceptual evaluation of distance measures for concatenative speech synthesis. In: International Conference on Spoken Language Processing ICSLP 1998, pp. 2747–2750 (1998)

    Google Scholar 

  7. Stylianou, Y., Syrdal, A.: Perceptual and objective detection of discontinuities in concatenative speech synthesis. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (2001)

    Google Scholar 

  8. Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesis. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)

    Google Scholar 

  9. Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification. PhD thesis, Ecole Nationale Supèrieure des Télécommunications (1996)

    Google Scholar 

  10. Maragos, P., Kaiser, J., Quatieri, T.: On separating amplitude from frequency modulations using energy operators. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (March 1992)

    Google Scholar 

  11. Teager, H.M.: Some observations on oral air flow during phonation. IEEE Trans. Acoust., Speech, Signal Processing (October 1980)

    Google Scholar 

  12. Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanism in the vocal tract. Speech Production and Speech Modelling 55 (July 1990)

    Google Scholar 

  13. Maragos, P., Quatieri, T.F., Kaiser, J.F.: Speech nonlinearities, modulations and energy operators. In: Proc. IEEE ICASSP 1991 (May 1991)

    Google Scholar 

  14. Vepa, J., King, S., Taylor, P.: Objective distance measures for spectal discontinuities in concatenative speech synthesis. In: ICSLP 2002, pp. 2605–2608 (2002)

    Google Scholar 

  15. House, A.S., Williams, C.E., Hecker, M.H.L., Kryter, K.D.: Phycoacoustic speech test: A modified rhyme test. Tech. Doc. Rept. ESD-TDR-63-403 (June 1963)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pantazis, Y., Stylianou, Y. (2005). Nonlinear Speech Features for the Objective Detection of Discontinuities in Concatenative Speech Synthesis. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_21

Download citation

  • DOI: https://doi.org/10.1007/11520153_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27441-4

  • Online ISBN: 978-3-540-31886-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics