Skip to main content

Last Syllable Unit Penalization in Unit Selection TTS

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

While unit selection speech synthesis tries to avoid speech modifications, it strongly depends on the placement of units into the correct position. Usually, the position is tightly coupled with a distance from the beginning/end of some prosodic or rhythmic units like phrases or words. The present paper shows, however, that it is not necessary to follow position requirements, when the phonetic knowledge of the perception of prosodic patterns (mostly durational in our case) is considered. In particular, we focus on the effects of using word-final units in word-internal positions in synthesized speech, which are often perceived negatively by listeners, due to disruptions in local timing.

This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S, and by the grant of the University of West Bohemia, project No. SGS-2016-039.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baddeley, A.: Human Memory: Theory and Practice. Psychology Press, East Sussex (1997). Revised edn

    Google Scholar 

  2. Beckman, M., Edwards, J.: Lengthenings and shortenings and the nature of prosodic constituency. In: Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, pp. 152–178. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  3. Buxton, H.: Temporal predictability in the perception of English speech. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models and Measurements, vol. 14, pp. 111–121. Springer, Heidelberg (1983)

    Google Scholar 

  4. Byrd, D., Saltzman, E.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phonetics 31, 149–180 (2003)

    Article  Google Scholar 

  5. Crystal, T.H., House, A.S.: Segmental durations in connected-speech signals: current results. J. Acoust. Soc. Am. 83, 1553–1573 (1988)

    Article  Google Scholar 

  6. Cutler, A., Butterfield, S.: Syllabic lengthening as a word boundary cue. In: Proceedings of the 3rd Australian SST, pp. 324–328 (1990)

    Google Scholar 

  7. Dankovičová, J.: The domain of articulation rate variation in Czech. J. Phonetics 25, 287–312 (1997)

    Google Scholar 

  8. Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech, pp. 2268–2272. ISCA (2014)

    Google Scholar 

  9. Fletcher, J.: The prosody of speech: timing and rhythm. In: The Handbook of Phonetic Sciences, pp. 521–602. Blackwell Publishing Ltd. (2010)

    Google Scholar 

  10. Gussenhoven, C.: The Phonology of Tone and Intonation. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  11. Hanzlíček, Z.: Czech HMM-based speech synthesis: experiments with model adaptation. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 107–114. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_14

    Chapter  Google Scholar 

  12. Holm, B., Bailly, G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of ICSLP, pp. 203–206 (2000)

    Google Scholar 

  13. Klatt, D.H.: Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Am. 59, 1208–1221 (1976)

    Article  Google Scholar 

  14. Ladd, D.R.: Intonational Phonology, 2nd edn. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  15. Matoušek, J., Hanzlíček, Z., Tihelka, D.: Hybrid syllable/triphone speech synthesis. In: Proceedings of 9th Interspeech (Eurospeech), Lisbon, Portugal, pp. 2529–2532 (2005)

    Google Scholar 

  16. Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent improvements on ARTIC: czech text-to-speech system. In: Proceedings of Interspeech, Jeju Island, Korea, pp. 1933–1936 (2004)

    Google Scholar 

  17. NíChasaide, A., Yanushevskaya, I., Gobl, C.: Prosody of voice: declination, sentence mode and interaction with prominence. In: Proceedings of 18th ICPhS (2015). Paper 476

    Google Scholar 

  18. Quené, H., van Delft, L.E.: Non-native durational patterns decrease speech intelligibility. Speech Commun. 52(11–12), 911–918 (2010)

    Google Scholar 

  19. Quené, H., Port, R.: Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica 62(1), 1–13 (2005)

    Google Scholar 

  20. Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proceedings of the 6th ISCA SSW, Bonn, pp. 200–205 (2007)

    Google Scholar 

  21. Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS, vol. 3206, pp. 441–447. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30120-2_56

    Chapter  Google Scholar 

  22. van Santen, J.P.H.: Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 95–128 (1994)

    Article  Google Scholar 

  23. Skarnitzl, R., Eriksson, A.: The acoustics of word stress in Czech as a function of speaking style. In: Proceedings of Interspeech (2017)

    Google Scholar 

  24. Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of 9th Interspeech (Eurospeech), pp. 2525–2528. ISCA, Bonn (2005)

    Google Scholar 

  25. Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56

    Google Scholar 

  26. Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)

    Google Scholar 

  27. Tihelka, D., Méner, M.: Generalized non-uniform time scaling distribution method for natural-sounding speech rate change. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 147–154. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_19

    Chapter  Google Scholar 

  28. Tihelka, D., Romportl, J.: Exploring automatic similarity measures for unit selection tuning. In: Proceedings of 10th Interspeech, pp. 736–739. ISCA, Brighton (2009)

    Google Scholar 

  29. Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech, pp. 442–445 (2007)

    Google Scholar 

  30. Volín, J., Poesová, K., Skarnitzl, R.: The impact of rhythmic distortions in speech on personality assessment. Res. Lang. 12, 209–216 (2014)

    Google Scholar 

  31. White, L., Turk, A.E.: English words on the procrustean bed: polysyllabic shortening reconsidered. J. Phonetics 38(3), 459–471 (2010)

    Article  Google Scholar 

  32. Windmann, A., Šimko, J., Wagner, P.: Polysyllabic shortening and word-final lengthening in English. In: Interspeech 2015, pp. 23–40 (2015)

    Google Scholar 

  33. Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: Proceedings of 9th ISCA SSW, pp. 218–223, September 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Tihelka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jůzová, M., Tihelka, D., Skarnitzl, R. (2017). Last Syllable Unit Penalization in Unit Selection TTS. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics