Skip to main content

On the Detection of Discontinuities in Concatenative Speech Synthesis

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

Last decade considerable work has been done in finding an objective distance measure which is able to predict audible discontinuities in concatenative speech synthesis. Speech segments in concatenative synthesis are extracted from disjoint phonetic contexts and discontinuities in spectral shape and phase mismatches tend to occur at unit boundaries. Many feature sets most of them of spectral nature and distances were tested. However there were significant discrepancies among the results. In this paper, we tested most of the distances that were proposed using the same listening experiment. Best score were given by AM&FM decomposition of the speech signal using Fisher’s linear discriminant.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Donovan, R.E.: Trainable Speech Synthesis. PhD thesis, Cambridge University, Engineering Department (1996)

    Google Scholar 

  2. Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using large speech database. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 373–376 (1996)

    Google Scholar 

  3. Campbell, W.N., Black, A.: Prosody and the selection of source units for concatenative synthesis. In: Van Santen, R., Sproat, R., Hirschberg, J., Olive, J. (eds.) Progress in Speech Synthesis, pp. 279–292. Springer, Heidelberg (1996)

    Google Scholar 

  4. Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A.: The AT&T Next-Gen TTS System. In: 137th meeting of the Acoustical Society of America (1999), http://www.research.att.com/projects/tts

  5. Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  6. Barnwell, T.R., Quackenbush, S.R., Clements, M.A.: Objective Measures of Speech Quality. Prentice-Hall, Englewood Cliffs (1988)

    Google Scholar 

  7. Wouters, J., Macon, M.: Perceptual evaluation of distance measures for concatenative speech synthesis. In: International Conference on Spoken Language Processing, ICSLP 98, pp. 2747–2750 (1998)

    Google Scholar 

  8. Chen, J.-D., Campbell, N.: Objective distance measures for assessing concatenative speech synthesis. In: EuroSpeech99, pp. 611–614 (1999)

    Google Scholar 

  9. Bellegarda, J.R.: A novel discontinuity metric for unit selection text-to-speech synthesis. In: 5th ISCA Speech Synthesis Worksop, pp. 133–138. Pittsburgh (2004)

    Google Scholar 

  10. Klabbers, E., Veldhuis, R.: On the reduction of concatenation artefacts in diphone synthesis. In: International Conference on Spoken Language Processing, ICSLP 98, pp. 1983–1986 (1998)

    Google Scholar 

  11. Stylianou, Y., Syrdal, A.: Perceptual and objective detection of discontinuities in concatenative speech synthesis. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (2001)

    Google Scholar 

  12. Donovan, R.E.: A new distance measure for costing spectral discontinuities in concatenative speech synthesis. In: The 4th ISCA Tutorial and Research Workshop on Speech Synthesis (2001)

    Google Scholar 

  13. Vepa, J., King, S., Taylor, P.: Objective distance measures for spectal discontinuities in concatenative speech synthesis. In: ICSLP 2002, pp. 2605–2608 (2002)

    Google Scholar 

  14. Soong, F.K., Juang, B.H.: Line spectrum pairs and speech data compression. In: ICCASP, pp. 1.10.1–1.10.4 (1984)

    Google Scholar 

  15. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  16. Teager, H.M.: Some observations on oral air flow during phonation. IEEE Trans. Acoust., Speech, Signal Processing (Oct. 1980)

    Google Scholar 

  17. Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanism in the vocal tract. Speech Production and Speech Modelling 55 (Jul. 1990)

    Google Scholar 

  18. Pantazis, Y., Stylainou, Y., Klabbers, E.: Discontinuity detection in concatenated speech synthesis based on nonlinear analysis. In: InterSpeech2005, pp. 2817–2820 (2005)

    Google Scholar 

  19. Stylianou, Y.: Harmonic plus Noise Models for Speech, combined with Statistical Methods, for Speech and Speaker Modification. PhD thesis, Ecole Nationale Supérieure des Télécommunications (1996)

    Google Scholar 

  20. Maragos, P., Kaiser, J., Quatieri, T.: On separating amplitude from frequency modulations using energy operators. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Mar. 1992)

    Google Scholar 

  21. Kawai, H., Tsuzaki, M.: Acoustic measures vs. phonetic measures as predictors of audible discontinuity in concatenative synthesis. In: ICSLP (2002)

    Google Scholar 

  22. Syrdal, A.K., Conkie, A.D.: Data-driven perceptually based join cost. In: 5th ISCA Speech Synthesis Workshop, pp. 49–54 (2004)

    Google Scholar 

  23. Wouters, J., Macon, M.W.: Unit fusion for concatenative speech synthesis. In: ICSLP (Oct. 2000)

    Google Scholar 

  24. Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Transactions on Speech and Audio Processing 9, 39–51 (2001)

    Article  Google Scholar 

  25. Vepa, J., Taylor, S.: Kalman-filter based join cost for unit selection speech synthesis. In: Eurospeech (Sep. 2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Pantazis, Y., Stylianou, Y. (2007). On the Detection of Discontinuities in Concatenative Speech Synthesis. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71505-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71503-0

  • Online ISBN: 978-3-540-71505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics