Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Tihelka, Daniel; Hanzlíček, Zdeněk; Jůzová, Markéta

doi:10.1007/978-3-030-60276-5_56

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1579 Accesses

Abstract

While unit selection speech synthesis is not at the centre of research nowadays, it shows its strengths in deployments where fast fixes and tuning possibilities are required. The key part of this method is target and concatenation costs, usually consisting of features manually designed. When there is a flaw in a feature design, the selection may behave in an unexpected way, not necessarily causing a bad quality speech output. One of such features in our systems was the requirement on the match between expected and real units voicing. Due to the flexibility of the method, we were able to narrow the behaviour of the selection algorithm without worsening the quality of synthesised speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Železný, M., Krňıoul, Z., Císař, P., Matoušek, J.: Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis. Sig. Process. 12, 3657–3673 (2006)
Article Google Scholar
Hanzlíček, Z., Vít, J., Tihelka, D.: WaveNet-based speech synthesis applied to Czech: a comparison with the traditional synthesis methods. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 445–452. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_48
Chapter Google Scholar
Hanzlíček, Z., Vít, J., Tihelka, D.: LSTM-based speech segmentation for TTS synthesis. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 361–372. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_31
Chapter Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP 1996, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, IEEE, Atlanta, Georgia, vol. 1, pp. 373–376 (1996)
Google Scholar
Kala, J., Matoušek, J.: Very fast unit selection using Viterbi search with zero-concatenation-cost chains. In: ICASSP 2014, Proceedings of International Conference on Acoustics, Speech, and Signal Processing, IEEE, Florence, Italy, pp. 2569–2573 (2014)
Google Scholar
Kalchbrenner, N., et al.: Efficient neural audio synthesis. arXiv preprint arXiv:1802.08435 (2018)
Legát, M., Matoušek, J., Tihelka, D.: A robust multi-phase pitch-mark detection algorithm. In: Interspeech, vol. 2007, pp. 1641–1644 (2007)
Google Scholar
Lorenzo-Trueba, J., et al.: Towards achieving robust universal neural vocoding, pp. 181–185 (2019)
Google Scholar
Machač, P., Skarnitzl, R.: Principles of Phonetic Segmentation. Epocha, Prague (2013)
Google Scholar
Matoušek, J., Legát, M.: Is unit selection aware of audible artifacts? In: SSW 2013, Proceedings of the 8th Speech Synthesis Workshop, ISCA, Barcelona, Spain, pp. 267–271 (2013)
Google Scholar
Matoušek, J., Romportl, J.: Automatic pitch-synchronous phonetic segmentation. In: INTERSPEECH 2008, Proceedings of 9th Annual Conference of International Speech Communication Association, ISCA, Brisbane, Australia, pp. 1626–1629 (2008)
Google Scholar
Matoušek, J., Tihelka, D.: Using extreme gradient boosting to detect glottal closure instants in speech signal. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, Great Britain, pp. 6515–6519 (2019)
Google Scholar
van den Oord, A., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Romportl, J.: Structural data-driven prosody model for TTS synthesis. In: Proceedings of the Speech Prosody 2006 Conference, pp. 549–552. TUDpress, Dresden (2006)
Google Scholar
Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005). https://doi.org/10.1007/11551874_48
Chapter Google Scholar
Taylor, P.: Text-to-Speech Synthesis, 1st edn. Cambridge University Press, New York (2009)
Book Google Scholar
Tihelka, D., Grůber, M., Hanzlíček, Z.: Robust methodology for TTS enhancement evaluation. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 442–449. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40585-3_56
Chapter Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Matoušek, J.: First steps towards hybrid speech synthesis in Czech TTS system ARTIC. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 676–686. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_69
Chapter Google Scholar
Tihelka, D., Hanzlíček, Z., Jůzová, M., Vít, J., Matoušek, J., Grůber, M.: Current state of text-to-speech system ARTIC: a decade of research on the field of speech technologies. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 369–378. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_40
Chapter Google Scholar
Tihelka, D., Matoušek, J., Hanzlíček, Z.: Modelling F0 dynamics in unit selection based speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 457–464. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_55
Chapter Google Scholar
Tihelka, D., Matoušek, J., Kala, J.: Quality deterioration factors in unit selection speech synthesis. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 508–515. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74628-7_66
Chapter Google Scholar
Tihelka, D., Romportl, J.: Exploring automatic similarity measures for unit selection tuning. In: INTERSPEECH 2009, Proceedings of 10th Annual Conference of International Speech Communication Association, ISCA, Brighton, Great Britain, pp. 736–739 (2009)
Google Scholar
Vít, J., Hanzlíček, Z., Matoušek, J.: Czech speech synthesis with generative neural vocoder. In: Ekštein, K. (ed.) TSD 2019. LNCS (LNAI), vol. 11697, pp. 307–315. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27947-9_26
Chapter Google Scholar
Wells, J.C.: SAMPA computer readable phonetic alphabet. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, Berlin and New York (1997)
Google Scholar
Wu, Z., Watts, O., King, S.: Merlin: an open source neural network speech synthesis system. In: 9th ISCA Speech Synthesis Workshop (2016), pp. 218–223, September 2016
Google Scholar

Download references

Acknowledgements

This research was supported by the Technology Agency of the Czech Republic (project No. TH02010307), and by the grant of the University of West Bohemia, (project No. SGS-2019-027).

Author information

Authors and Affiliations

New Technologies for the Information Society, University of West Bohemia, Pilsen, Czech Republic
Daniel Tihelka & Zdeněk Hanzlíček
Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Markéta Jůzová

Authors

Daniel Tihelka
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Hanzlíček
View author publications
You can also search for this author in PubMed Google Scholar
Markéta Jůzová
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Tihelka .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tihelka, D., Hanzlíček, Z., Jůzová, M. (2020). Uncertainty of Phone Voicing and Its Impact on Speech Synthesis. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_56
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Abstract

Access this chapter

Similar content being viewed by others

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Abstract

Access this chapter

Similar content being viewed by others

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation