Quality Improvements of Zero-Concatenation-Cost Chain Based Unit Selection

Kala, Jiří; Matoušek, Jindřich

doi:10.1007/978-3-319-11581-8_47

Quality Improvements of Zero-Concatenation-Cost Chain Based Unit Selection

Jiří Kala²² &
Jindřich Matoušek²²

Conference paper

1297 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Abstract

In our previous work, we introduced a zero-concatenation-cost (ZCC) chain based framework of unit-selection speech synthesis. This framework proved to be very fast as it reduced the computational load of a unit-selection system up to hundreds of time. Since the ZCC chain based algorithm principally prefers to select longer segments of speech, an increased number of audible artifacts were expected to occur at concatenation points of longer ZCC chains. Indeed, listening tests revealed a number of artifacts present in synthetic speech; however, the artifacts occurred in a similar extent in synthetic speech produced by both ZCC chain based and standard Viterbi search algorithms. In this paper, we focus on the sources of the artifacts and we propose improvements of the synthetic speech quality within the ZCC algorithm. The quality and computational demands of the improved ZCC algorithm are compared to the unit-selection algorithm based on the standard Viterbi search.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Beutnagel, M., Mohri, M., Riley, M.: Rapid unit selection from a large speech corpus for concatenative speech synthesis. In: Proc. EUROSPEECH, Budapest, Hungary, pp. 607–610 (1999)
Google Scholar
Blouin, C., Bagshaw, P.C., Rosec, O.: A method of unit pre-selection for speech synthesis based on acoustic clustering and decision trees. In: Proc. ICASSP, Hong Kong, vol. 1, pp. 692–695 (2003)
Google Scholar
Čepko, J., Talafová, R., Vrabec, J.: Indexing join costs for faster unit selection synthesis. In: Proc. Internat. Conf. Systems, Signals Image Processing (IWSSIP), Bratislava, Slovak Republic, pp. 503–506 (2008)
Google Scholar
Conkie, A., Beutnagel, M., Syrdal, A.K., Brown, P.: Preselection of candidate units in a unit selection-based text-to-speech synthesis system. In: Proc. ICSLP, Beijing, China, vol. 3, pp. 314–317 (2000)
Google Scholar
Conkie, A., Syrdal, A.K.: Using F0 to constrain the unit selection Viterbi network. In: Proc. ICASSP, Prague, Czech Republic, pp. 5376–5379 (2011)
Google Scholar
Hamza, W., Donovan, R.: Data-driven segment preselection in the IBM trainable speech synthesis system. In: Proc. INTERSPEECH, Denver, USA, pp. 2609–2612 (2002)
Google Scholar
Hunt, A.J., Black, A.W.: Unit selection in concatenative speech synhesis system using a large speech database. In: Proc. ICASSP, Atlanta, USA, pp. 373–376 (1996)
Google Scholar
Kala, J., Matoušek, J.: Very fast unit selection using Viterbi search with zero-concatenation-cost chains. In: Proc. ICASSP, Florence, Italy (2014)
Google Scholar
Legát, M., Matoušek, J., Tihelka, D.: On the detection of pitch marks using a robust multi-phase algorithm. Speech Commun. 53(4), 552–566 (2011)
Article Google Scholar
Ling, Z.H., Hu, Y., Shuang, Z.W., Wang, R.H.: Decision tree based unit pre-selection in Mandarin Chinese synthesis. In: Proc. ISCSLP, Taipei, Taiwan (2002)
Google Scholar
Matoušek, J., Romportl, J.: On building phonetically and prosodically rich speech corpus for text-to-speech synthesis. In: Proc. 2nd IASTED Internat. Conf. on Computational Intelligence, San Francisco, USA, pp. 442–447 (2006)
Google Scholar
Nishizawa, N., Kawai, H.: Unit database pruning based on the cost degradation criterion for concatenative speech synthesis. In: Proc. ICASSP, Las Vegas, USA, pp. 3969–3972 (2008)
Google Scholar
Riley, M.: Tree-based modeling for speech synthesis. In: Bailly, G., Benoit, C., Sawallis, T. (eds.) Talking Machines: Theories, Models and Designs, pp. 265–273. Elsevier, Amsterdam (1992)
Google Scholar
Romportl, J., Kala, J.: Prosody modelling in czech text-to-speech synthesis. In: Proceedings of the 6th ISCA Workshop on Speech Synthesis, pp. 200–205. Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn (2007)
Google Scholar
Romportl, J., Matoušek, J., Tihelka, D.: Advanced prosody modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004)
Chapter Google Scholar
Sakai, S., Kawahara, T., Nakamura, S.: Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis. In: Proc. ICASSP, Las Vegas, USA, pp. 4613–4616 (2008)
Google Scholar
Taylor, P., Caley, R., Black, A., King, S.: Edinburgh speech tools library: System documentation (1999), http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0/
Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proc. INTERSPEECH, Makuhari, Japan, pp. 174–177 (2010)
Google Scholar
Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: Proc. INTERSPEECH, Pittsburgh, USA, pp. 2042–2045 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Czech Rep.
Jiří Kala & Jindřich Matoušek

Authors

Jiří Kala
View author publications
You can also search for this author in PubMed Google Scholar
Jindřich Matoušek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kala, J., Matoušek, J. (2014). Quality Improvements of Zero-Concatenation-Cost Chain Based Unit Selection. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics