Last Syllable Unit Penalization in Unit Selection TTS

Jůzová, Markéta; Tihelka, Daniel; Skarnitzl, Radek

doi:10.1007/978-3-319-64206-2_36

Markéta Jůzová¹⁵,
Daniel Tihelka¹⁵ &
Radek Skarnitzl¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1558 Accesses

Abstract

While unit selection speech synthesis tries to avoid speech modifications, it strongly depends on the placement of units into the correct position. Usually, the position is tightly coupled with a distance from the beginning/end of some prosodic or rhythmic units like phrases or words. The present paper shows, however, that it is not necessary to follow position requirements, when the phonetic knowledge of the perception of prosodic patterns (mostly durational in our case) is considered. In particular, we focus on the effects of using word-final units in word-internal positions in synthesized speech, which are often perceived negatively by listeners, due to disruptions in local timing.

This research was supported by the Czech Science Foundation (GA CR), project No. GA16-04420S, and by the grant of the University of West Bohemia, project No. SGS-2016-039.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

$$\hbox {F}_0$$ Post-Stress Rise Trends Consideration in Unit Selection TTS

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

References

Baddeley, A.: Human Memory: Theory and Practice. Psychology Press, East Sussex (1997). Revised edn
Google Scholar
Beckman, M., Edwards, J.: Lengthenings and shortenings and the nature of prosodic constituency. In: Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, pp. 152–178. Cambridge University Press, Cambridge (1990)
Google Scholar
Buxton, H.: Temporal predictability in the perception of English speech. In: Cutler, A., Ladd, D.R. (eds.) Prosody: Models and Measurements, vol. 14, pp. 111–121. Springer, Heidelberg (1983)
Google Scholar
Byrd, D., Saltzman, E.: The elastic phrase: modelling the dynamics of boundary-adjacent lengthening. J. Phonetics 31, 149–180 (2003)
Article Google Scholar
Crystal, T.H., House, A.S.: Segmental durations in connected-speech signals: current results. J. Acoust. Soc. Am. 83, 1553–1573 (1988)
Article Google Scholar
Cutler, A., Butterfield, S.: Syllabic lengthening as a word boundary cue. In: Proceedings of the 3rd Australian SST, pp. 324–328 (1990)
Google Scholar
Dankovičová, J.: The domain of articulation rate variation in Czech. J. Phonetics 25, 287–312 (1997)
Google Scholar
Fernandez, R., Rendel, A., Ramabhadran, B., Hoory, R.: Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In: Proceedings of Interspeech, pp. 2268–2272. ISCA (2014)
Google Scholar
Fletcher, J.: The prosody of speech: timing and rhythm. In: The Handbook of Phonetic Sciences, pp. 521–602. Blackwell Publishing Ltd. (2010)
Google Scholar
Gussenhoven, C.: The Phonology of Tone and Intonation. Cambridge University Press, Cambridge (2004)
Google Scholar
Hanzlíček, Z.: Czech HMM-based speech synthesis: experiments with model adaptation. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 107–114. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_14
Chapter Google Scholar
Holm, B., Bailly, G.: Generating prosody by superposing multi-parametric overlapping contours. In: Proceedings of ICSLP, pp. 203–206 (2000)
Google Scholar
Klatt, D.H.: Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Am. 59, 1208–1221 (1976)
Article Google Scholar
Ladd, D.R.: Intonational Phonology, 2nd edn. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Matoušek, J., Hanzlíček, Z., Tihelka, D.: Hybrid syllable/triphone speech synthesis. In: Proceedings of 9th Interspeech (Eurospeech), Lisbon, Portugal, pp. 2529–2532 (2005)
Google Scholar
Matoušek, J., Romportl, J., Tihelka, D., Tychtl, Z.: Recent improvements on ARTIC: czech text-to-speech system. In: Proceedings of Interspeech, Jeju Island, Korea, pp. 1933–1936 (2004)
Google Scholar
NíChasaide, A., Yanushevskaya, I., Gobl, C.: Prosody of voice: declination, sentence mode and interaction with prominence. In: Proceedings of 18th ICPhS (2015). Paper 476
Google Scholar
Quené, H., van Delft, L.E.: Non-native durational patterns decrease speech intelligibility. Speech Commun. 52(11–12), 911–918 (2010)
Google Scholar
Quené, H., Port, R.: Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica 62(1), 1–13 (2005)
Google Scholar
Romportl, J., Kala, J.: Prosody modelling in Czech text-to-speech synthesis. In: Proceedings of the 6th ISCA SSW, Bonn, pp. 200–205 (2007)
Google Scholar
Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS, vol. 3206, pp. 441–447. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30120-2_56
Chapter Google Scholar
van Santen, J.P.H.: Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 95–128 (1994)
Article Google Scholar
Skarnitzl, R., Eriksson, A.: The acoustics of word stress in Czech as a function of speaking style. In: Proceedings of Interspeech (2017)
Google Scholar
Tihelka, D.: Symbolic prosody driven unit selection for highly natural synthetic speech. In: Proceedings of 9th Interspeech (Eurospeech), pp. 2525–2528. ISCA, Bonn (2005)
Google Scholar
Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 442–449. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40585-3_56
Google Scholar
Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)
Google Scholar
Tihelka, D., Méner, M.: Generalized non-uniform time scaling distribution method for natural-sounding speech rate change. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 147–154. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_19
Chapter Google Scholar
Tihelka, D., Romportl, J.: Exploring automatic similarity measures for unit selection tuning. In: Proceedings of 10th Interspeech, pp. 736–739. ISCA, Brighton (2009)
Google Scholar
Volín, J., Skarnitzl, R.: Temporal downtrends in Czech read speech. In: Proceedings of Interspeech, pp. 442–445 (2007)
Google Scholar
Volín, J., Poesová, K., Skarnitzl, R.: The impact of rhythmic distortions in speech on personality assessment. Res. Lang. 12, 209–216 (2014)
Google Scholar
White, L., Turk, A.E.: English words on the procrustean bed: polysyllabic shortening reconsidered. J. Phonetics 38(3), 459–471 (2010)
Article Google Scholar
Windmann, A., Šimko, J., Wagner, P.: Polysyllabic shortening and word-final lengthening in English. In: Interspeech 2015, pp. 23–40 (2015)
Google Scholar
Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: Proceedings of 9th ISCA SSW, pp. 218–223, September 2016
Google Scholar

Download references

Author information

Authors and Affiliations

New Technologies for the Information Society (NTIS) and Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Markéta Jůzová & Daniel Tihelka
Faculty of Arts, Institute of Phonetics, Charles University, Prague, Czech Republic
Radek Skarnitzl

Authors

Markéta Jůzová
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Radek Skarnitzl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Tihelka .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jůzová, M., Tihelka, D., Skarnitzl, R. (2017). Last Syllable Unit Penalization in Unit Selection TTS. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_36
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Last Syllable Unit Penalization in Unit Selection TTS

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

$$\hbox {F}_0$$ Post-Stress Rise Trends Consideration in Unit Selection TTS

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Last Syllable Unit Penalization in Unit Selection TTS

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

$$\hbox {F}_0$$ Post-Stress Rise Trends Consideration in Unit Selection TTS

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation