Skip to main content

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

  • 1808 Accesses

Abstract

Unit selection speech synthesis systems generally rely on target and concatenation costs for selecting a best unit sequence. These costs, though often considering contextual features, mainly include local distances that are accumulated afterwards. In this paper, we describe a new duration target cost that takes a whole sequence into account. It aims at selecting a sequence globally good, instead of a very good sequence almost everywhere but having a few local duration cost leaps that are counter-balanced by other units. The problem of weighting this new duration cost with other sub-costs is also investigated. Experiments showed this new measure performed well on sentences featuring duration artefacts, while not deteriorating others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yamagishi, J., Ling, Z., King, S.: Robustness of HMM-based speech synthesis. In: Ninth Annual Conference of the International Speech Communication Association, pp. 2–5 (2008)

    Google Scholar 

  2. Sagisaka, Y.: Speech synthesis by rule using an optimal selection of non-uniform synthesis units. In: Proc. of ICASSP, pp. 679–682. IEEE (1988)

    Google Scholar 

  3. Black, A., Taylor, P.: Chatr: a generic speech synthesis system. In: Proc. of Coling, Association for Computational Linguistics (1994)

    Google Scholar 

  4. Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of ICASSP, pp. 373–376. IEEE (1996)

    Google Scholar 

  5. Taylor, P., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: Proc. of the ESCA Workshop in Speech Synthesis, pp. 147–151 (1998)

    Google Scholar 

  6. Breen, A., Jackson, P.: Non-uniform unit selection and the similarity metric within bts laureate tts system. In: Proc. of the ESCA Workshop on Speech Synthesis, pp. 373–376. Citeseer (1998)

    Google Scholar 

  7. Clark, R., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication, 317–330 (2007)

    Google Scholar 

  8. Kumar, R.: A genetic algorithm for unit selection based speech synthesis. In: Eighth International Conference on Spoken Language Processing (2004)

    Google Scholar 

  9. Schröder, M.: Expressive Speech Synthesis: Past, Present, and Possible Futures. In: Affective Information Processing, pp. 111–126. Springer, London (2009)

    Google Scholar 

  10. Alías, F., Formiga, L., Llorá, X.: Efficient and reliable perceptual weight tuning for unit-selection text-to-speech synthesis based on active interactive genetic algorithms: A proof-of-concept. Speech Communication, 786–800 (May 2011)

    Google Scholar 

  11. Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: The effect of neural networks in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4455–4459 (2015)

    Google Scholar 

  12. Guennec, D., Lolive, D.: Unit selection cost function exploration using an A* based text-to-speech system. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 432–440. Springer, Heidelberg (2014)

    Google Scholar 

  13. Tuerk, C., Robinson, T.: Speech synthesis using artificial neural networks trained on cepstral coefficients. In: Proc. of EUROSPEECH, pp. 4–7 (1993)

    Google Scholar 

  14. Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: Proc. of World Congress on Neural Networks, pp. 45–50 (1996)

    Google Scholar 

  15. Taylor, P.: The target cost formulation in unit selection speech synthesis. In: Proc. of Stress, pp. 2038–2041 (2006)

    Google Scholar 

  16. Boeffard, O., Charonnat, L., Le Maguer, S., Lolive, D., Vidal, G.: Towards fully automatic annotation of audio books for tts. In: Proc. of LREC, pp. 975–980 (2012)

    Google Scholar 

  17. Chevelu, J., Lecorvé, G., Lolive, D.: Roots: a toolkit for easy, fast and consistent processing of large sequential annotated data collections. In: Proc. of LREC, pp. 619–626 (2014)

    Google Scholar 

  18. ITU-T: Itu-t recommendation p. 800: Methods for subjective determination of transmission quality (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Guennec .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Guennec, D., Chevelu, J., Lolive, D. (2015). Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics