Polish unit selection speech synthesis with BOSS: extensions and speech corpora

Demenko, Grażyna; Klessa, Katarzyna; Szymański, Marcin; Breuer, Stefan; Hess, Wolfgang

doi:10.1007/s10772-010-9071-3

Polish unit selection speech synthesis with BOSS: extensions and speech corpora

Published: 20 May 2010

Volume 13, pages 85–99, (2010)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Grażyna Demenko¹,
Katarzyna Klessa¹,
Marcin Szymański²,
Stefan Breuer³ &
…
Wolfgang Hess³

127 Accesses
2 Citations
Explore all metrics

Abstract

This article presents research and development aimed at creating a Polish speech database for speech synthesis and adapting BOSS (The Bonn Open Synthesis System) to the Polish language. First of all, the linguistic background for the design of Polish spoken resources for unit selection is presented, together with the presentation of the applied transcription and annotation methods. The next section details the assumptions and the structure of the Polish corpus and its segmental and prosodic annotation. Then, the linguistic features used in duration modelling and the selection of adequate speech units of two Polish modules in BOSS are reported: the duration prediction module (the description is accompanied by a concise overview of Polish duration modelling for speech technology purposes) and the cost functions module. Finally, the results of two kinds of perception tests are discussed: the first is a preference test aimed at the evaluation of synthesized speech obtained using three variants of speech signal segmentation (automatic, semi-automatic and manual) and the second is a mean opinion score test carried out to provide a preliminary assessment of the synthesized speech quality attained with the Polish version of the BOSS synthesizer. The closing chapter summarizes future perspectives and challenges for the Polish TTS (text-to-speech) and further developments of BOSS for Polish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Baranowska, E., Francuzik, K., Karpiński, M., & Kleśta, J. (2003). Identification of nuclear melody. Placement in Polish read texts. In A. Mettouchi & G. Ferre (Eds.), Interfaces prosodiques, Nantes, France.
Batusek, R. A. (2002). Duration model for Czech text-to-speech synthesis. In Proc. of speech prosody, Aix-en-Provence, France.
Bonafonte, A., Höge, H., Kiss, I., Moreno, A., Ziegenhain, U., van den Heuvel, H., Hain, H.-U., Wang, X. S., & Garcia, M. N. (2006). TC-STAR: Specifications of language resources and evaluation for speech synthesis. In Proceedings of LREC (international conference on language resources and evaluation), Genoa, Italy.
Bonafonte, A., Lourdes, A., Esquerra1, I., Oller, S., & Moreno, A. (2009). Recent work on the FESTCAT database for speech synthesis. In Proceedings of the I Iberian SLTech 2009, Porto Salvo, Portugal.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Monterey: Wadsworth & Brooks/Cole Advanced Books & Software.
MATH Google Scholar
Breuer, S., & Abresch, J. (2003). Unit selection speech synthesis for a directory enquiries service. In Proceedings of the ICPhS, Barcelona, Spain.
Campbell, N. (1992). Multi-level timing in speech University of Sussex. PhD Thesis. (Exp. Psychol): Brighton, UK.
Chung, H., & Huckvale, M. A. (2001). Linguistic factors affecting timing in Korean with application to speech synthesis. In Proceedings of Eurospeech, Scandinavia.
Cruttenden, A. (1994). Intonation. Cambridge: Cambridge University Press.
Google Scholar
Demenko, G. (1999). Analiza cech suprasegmentalnych języka polskiego na potrzeby syntezy mowy. Poznań: Wydawnictwo Naukowe UAM.
Google Scholar
Demenko, G. (2005). Speech synthesis of Polish based on the concatenation phonetic-acoustic segments. In 2nd language & technology conference: Human language technologies as a challenge for computer science and linguistics, April 21–23, 2005, Poznań, Poland.
Demenko, G., Wypych, M., & Baranowska, E. (2003). Speech and language technology : Vol. 7. Implementation of grapheme-to-phoneme rules and extended SAMPA alphabet in Polish text-to-speech synthesis. Poznań: Edition PTFON.
Google Scholar
Demenko, G., Bachan, J., Möbius, B., Klessa, K., Szymański, M., & Grocholewski, G. (2008). Development and evaluation of Polish speech corpus for unit selection speech synthesis systems. In Proceedings of Interspeech 2008, Brisbane, Australia.
Fék, M., Pesti, P., Németh, G., Zainkó, C., & Olaszy, G. (2006). Corpus-based unit selection TTS for Hungarian. TSD 2006 367-373 (retrieved from http://speechlab.tmit.bme.hu/zainko/ on 1 March 2010).
Fujisaki, H., Hirose, K., & Takahashi, N. (1990). Manifestation of linguistic and paralinguistic information in the voice fundamental frequency contours of spoken Japanese. In Proceedings of ICSLP, Kobe, Japan.
Gardner-Bonneau, D. (Ed.) (2003). Special Issue on Speech Synthesis. International Journal of Speech Technology. Kluwer Academic Publishers.
Gibbon, D., Moore, R., & Winski, R. (1997). Handbook of standards and resources for spoken language systems. Berlin: Mouton de Gruyter.
Google Scholar
Grocholewski, S. (1997). Corpora—speech database for Polish diphones. In Proceedings of Eurospeech’97 (pp. 1735–1738).
Hirst, D., & Di Cristo, A. (Eds.) (1998). Intonation systems. A survey of twenty languages. Cambridge: Cambridge University Press.
Google Scholar
Jassem, W. (1962). Akcent języka polskiego. Wrocław: Ossolineum.
Google Scholar
Jassem, W. (2003). Illustrations of the IPA: Polish. Journal of the Phonetic Association, 23(1), 103–107.
Article Google Scholar
Jassem, W., Morton, J., & Steffen-Batóg, M. (1968). The perception of stress in synthetic speech-like stimuli by Polish listeners. In W. Jassem (Ed.), Speech analysis and synthesis 1 (pp. 289–308). Warszawa: Państwowe Wydawnictwo Naukowe.
Google Scholar
Jassem, W., Krzyśko, M., & Stolarski, P. (1981). IPPT PAN: Vol. 33. Regresyjny model izochronizmu zestrojowego w sygnale mowy, Warszawa.
Keating, P. (1979). A phonetic study of a voicing contrast in Polish. Unpublished doctoral dissertation, Brown University.
Klatt, D. H. (1979). Synthesis by rule of segmental durations in English sentences. In K. Lindblom & K. Ohman (Eds.), Frontiers of speech communication research. London: Academic Press.
Google Scholar
Klessa, K. (2006). Analiza iloczasu głoskowego na potrzeby syntezy mowy polskiej. Unpublished doctoral dissertation, Adam Mickiewicz University, Poznań, Poland.
Klessa, K., Szymański, M., Breuer, S., & Demenko, G. (2007). Optimization of Polish segmental duration prediction with CART. In SSW6, Bonn.
Matoušek, J., Tihelka, D., & Romportl, J. (2008). Building of a speech corpus optimised for unit selection TTS synthesis. In Proceedings of LREC (international conference on language resources and evaluation), Marrakech, Morocco.
Mixdorff, H. (1998). Intonation patterns of German—Model-based quantitative analysis and synthesis of F0-contours. PhD thesis submitted to TU Dresden.
Möbius, B. (2000). Corpus-based speech synthesis: Methods and challenges. In W. Sendlmeier (Ed.), Forum Phoneticum : Vol. 69. Speech and signals: Aspects of speech synthesis and automatic speech recognition (pp. 79–96). Frankfurt a. M.: Hector.
Google Scholar
Möbius, B. (2001). Rare events and closed domains: Two delicate concepts in speech synthesis. In Fourth ISCA ITRW on speech synthesis, Perthshire, Scotland.
Möbius, B., & van Santen, J. P. H. (1996). Modeling segmental duration in German text-to-speech synthesis. In Proceedings of the international conference on spoken language processing (Vol. 4, pp. 2395–2398) Philadelphia, PA.
Morton, J., & Jassem, W. (1965). Acoustic correlates of stress. Language and Speech, 8, 150–181.
Google Scholar
Ostendorf, M., Digalakis, Vassilios V., & Kimball, Owen A. (1996). From HMM’s to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4(5), 360–378.
Article Google Scholar
Richter, L. (1974). Porównanie iloczasu samogłosek polskich wymówionych w logatomach oraz w wyrazach. In Biuletyn Polskiego towarzystwa fonetycznego (Vol. 32, pp. 173–178).
Richter, L. (1978). Wpływ pozycji w zestroju akcentowym na czas trwania głosek. In Lingua Posnaniensia, Vol. 21, Poznań, Poland.
Riedi, M. P. (1998). Controlling segmental duration in speech synthesis systems. PhD thesis, TIK-Schriftenreihe (26), ETH Zürich.
Sagisaka, Y., Campbell, N., & Higuchi, N. (1997). Computing prosody, computational models for processing spontaneous speech. New York: Springer.
Google Scholar
Śledziński, D. (2007). Fonetyczno-akustyczna analiza struktury sylaby w języku polskim na potrzeby technologii mowy. Unpublished PhD Thesis, Adam Mickiewicz University, Poznań, Poland.
Steffen-Batóg, M., & Nowakowski, P. (1993). An algorithm for phonetic transcription of orthographic texts in Polish. In M. Steffen-Batóg & W. Awedyk (Eds.), Studia phonetica posnaniensia, Vol. 3. Poznań: Wydawnictwo Naukowe UAM.
Google Scholar
Steffen-Batogowa, M. (1975). Automatyzacja transkrypcji fonematycznej tekstów polskich. Warszawa: PWN.
Google Scholar
Szymański, M., & Grocholewski, S. (2005). Transcription-based automatic segmentation of speech. In Proceedings of 2nd language & technology conference (pp. 11–15). Poznań.
Szymański, M., & Grocholewski, S. (2006). Post-processing of automatic segmentation of speech using dynamic programming. In LNAI. Proc. 9th international conference on text, speech and dialogue, Brno. Berlin: Springer.
Google Scholar
Szymański, M., & Grocholewski, S. (2008). Error prediction-based semi-automatic segmentation of speech databases. In LNAI. Proc. 11th international conference on text, speech and dialog, Brno, Czech Republic. Berlin: Springer.
Google Scholar
Tokuda, K., & Black, A. (2005). The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets. In Proc. Interspeech (Eurospeech) (pp. 77–80).
Toledano, D., Hernández Gómez, L. A., & Villarrubia Grande, L. (2003). Automatic phonetic segmentation. IEEE Transactions on Speech and Audio Processing, 11(6), 617–625.
Article Google Scholar
Van Santen, J. P. H. (1993a). Exploring N-way tables with sums-of-product models. Journal of Mathematical Psychology, 37(3), 327–371.
Article MATH MathSciNet Google Scholar
Van Santen, J. P. H. (1993b). Quantitative modeling of segmental duration. In Proceedings of human language technology conference (pp. 323–328), Princeton, New Jersey.
Van Santen, J., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In Proceedings Eurospeech 1997, Rhodos, Greece.
Van Son, R. J. J. H., & Van Santen, J. P. H. (1997). Strong interaction between factors influencing consonant duration. In Proceedings of Eurospeech ’97, Rhodos.
Wagner, A. (2008). Kompleksowy model intonacji do zastosowania w syntezie mowy. Unpublished doctoral dissertation, Adam Mickiewicz University, Poznań, Poland.
Wells, J. (1996). The SAMPA homepage. http://www.phon.ucl.ac.uk/home/sampa/home.htm.

Download references

Author information

Authors and Affiliations

Instytut Językoznawstwa, Uniwersytet im. Adama Mickiewicza, Poznań, Poland
Grażyna Demenko & Katarzyna Klessa
Laboratorium Zintegrowanych Systemów Przetwarzania Języka i Mowy, Poznańskie Centrum Superkomputerowo-Sieciowe, Instytut Chemii Bioorganicznej PAN, Poznań, Poland
Marcin Szymański
Institut für Kommunikationswissenschaften, Abteilung Sprache und Kommunikation, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
Stefan Breuer & Wolfgang Hess

Authors

Grażyna Demenko
View author publications
You can also search for this author in PubMed Google Scholar
Katarzyna Klessa
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Szymański
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Breuer
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Hess
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Breuer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demenko, G., Klessa, K., Szymański, M. et al. Polish unit selection speech synthesis with BOSS: extensions and speech corpora. Int J Speech Technol 13, 85–99 (2010). https://doi.org/10.1007/s10772-010-9071-3

Download citation

Received: 05 April 2010
Accepted: 15 April 2010
Published: 20 May 2010
Issue Date: June 2010
DOI: https://doi.org/10.1007/s10772-010-9071-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polish unit selection speech synthesis with BOSS: extensions and speech corpora

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Polish unit selection speech synthesis with BOSS: extensions and speech corpora

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation