Abstract
The Bonn Open Synthesis System (BOSS) is an open-source software distribution for unit selection speech synthesis that aims to be easily extensible to new target languages and different applications. To achieve this flexibility, many aspects of the software have been changed in recent years, including the addition of a refined interface to synthesis modules and a more strict separation of language-specific and language-independent code. This article wants to give an overview of the architecture from a technical perspective and explain how it can be adapted for a particular purpose and voice. This is preceded by a short introduction to the unit selection paradigm in general and a section on the specifics of the approach taken by BOSS. A particular focus will be placed on the extensions made for the integration of Polish during which some of the flexibilisation measures were conducted. Further information on the application to Polish but with an emphasis on the linguistic, phonetic and acoustic aspects as well as the speech corpus used can be found in the second part of this two-part article, “Polish unit selection speech synthesis with BOSS”, also published in this issue of the Journal.
Similar content being viewed by others
References
Bachmann, A., & Breuer, S. (2007). Development of a BOSS unit selection module for tone languages. In SSW6-2007 (pp. 166–171).
Birkholz, P., & Jackèl, D. (2003). A three-dimensional model of the vocal tract for speech synthesis. In Proceedings of the 15th international congress of phonetic sciences (pp. 2597–2600), Barcelona, Spain.
Birkholz, P., Steiner, I., & Breuer, S. (2007). Control concepts for articulatory speech synthesis. In 6th ISCA workshop on speech synthesis (pp. 5–10), Bonn, Germany.
Black, A. W., Taylor, P., & Caley, R. (1999). The festival speech synthesis system: system documentation. CSTR, Edinburgh, edition 1.4 for festival version 1.4.0 edition.
Bonn Open Synthesis System (BOSS) (2010). Project Homepage: http://sourceforge.net/projects/boss-synth/.
Breuer, S. (2009). Multilinguale und multifunktionale Unit-Selection-Sprachsynthese: Designprinzipien für Architektur und Sprachbausteine. PhD thesis, Universität Bonn. http://hss.ulb.uni-bonn.de/diss_online/phil_fak/2009/breuer_stefan/breuer.htm.
Breuer, S., & Abresch, J. (2003). Unit selection speech synthesis for a directory enquiries service. In Proceedings of the ICPhS, Barcelona, Spain.
Breuer, S., & Abresch, J. (2004). Phoxsy: Multi-phone segments for unit selection speech synthesis. In Proceedings of the international conference on spoken language processing (ICSLP), Jeju.
Campbell, W. N., & Black, A. (1996). Prosody and the selection of source units for concatenation synthesis. In J. P. H. Van Santen, R. Sproat, J. Olive, & J. Hirschberg (Eds.), Progress in speech synthesis (pp. 279–291). New York: Springer.
Daelemans, W. M., & van den Bosch, A. P. J. (1996). Language-independent data-oriented grapheme-to-phoneme conversion. In J. van Santen, R. Sproat, J. Olive, & J. Hirschberg (Eds.), Progress in speech synthesis (pp. 77–89). New York: Springer.
Hess, W. (1992). Speech synthesis—a solved problem? In Signal processing VI, proceedings EUSIPCO, Brussels, Belgium.
Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of ICASSP (pp. 373–376).
Klabbers, E. (1997). High-quality speech output generation through advanced phrase concatenation. In Speech technology in the public telephone network: Where are we today?, Proceedings COST Telecom workshop, Rhodes, Greece.
Klabbers, E., & Stöber, K. (2001). Creation of speech corpora for the multilingual Bonn Open Synthesis system. In 4th ISCA tutorial and research workshop on speech synthesis, Pitlochry, Scotland.
Klabbers, E., Stöber, K., Veldhuis, R., Wagner, P., & Breuer, S. (2001). Speech synthesis development made easy: The Bonn Open Synthesis system. In Proceedings of EUROSPEECH, Aalborg, Denmark.
Klatt, D. H. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82, 737–793.
Möbius, B. (2000). Corpus-based speech synthesis: Methods and challenges. In W. Sendlmeier (Ed.), Forum Phoneticum : Vol. 69. Speech and signals: Aspects of speech synthesis and automatic speech recognition (pp. 79–96). Frankfurt a. M.: Hector.
Moers, D., Wagner, P., & Breuer, S. (2007). Assessing the adequate treatment of fast speech in unit selection speech synthesis systems for the visually impaired. In SSW6-2007 (pp. 282–287).
Moers, D., Wagner, P., Möbius, B., Müllers, F., & Jauk, I. (2010). Integrating a fast speech corpus in unit selection speech synthesis: Experiments on perception, segmentation and duration prediction. In Speech prosody 2010, satellite workshop on prosodic prominence: Perceptual and automatic identification, Chicago, IL.
Rohde, H., & Breuer, S. (2005). An HMM-synthesizer for BOSS. In Proceedings of the 16th conference on electronic speech signal processing (ESSP), Prague.
Sagisaka, Y. (1988). Speech synthesis by rule using an optimal selection of non-uniform synthesis units. In Proceedings IEEE ICASSP, New York, USA.
Schröder, M., & Breuer, S. (2004). XML representation languages as a way of interconnecting tts modules. In Proceedings of the international conference on spoken language processing (ICSLP), Jeju.
Sjölander, K., & Beskow, J. (2000). Wavesurfer—an open source speech tool. In Proc. of ICSLP (Vol. 4, pp. 464–467), Beijing.
Sproat, R. (Ed.) (1998). Multilingual text-to-speech synthesis: The Bell labs approach. Dordrecht: Kluwer Academic.
Stöber, K. (2003). Bestimmung und Auswahl von Zeitbereichseinheiten für die konkatenative Sprachsynthese. Frankfurt a. M.: Lang.
Stöber, K., Wagner, P., Helbig, J., Köster, S., Stall, D., Thomae, M., Blauert, J., Hess, W., Hoffmann, R., & Mangold, H. (2000). Speech synthesis using multilevel selection and concatenation of units from large speech corpora. In W. Wahlster (Ed.), Verbmobil: Foundations of speech-to-speech translation (pp. 519–536). Berlin: Springer.
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A. W., & Tokuda, K. (2007). The HMM-based speech synthesis system version 2.0. In Proc. of ISCA SSW6, Bonn, Germany.
Author information
Authors and Affiliations
Corresponding author
Additional information
S. Breuer now with Phonetics Arts Ltd., Cambridge, UK.
Rights and permissions
About this article
Cite this article
Breuer, S., Hess, W. The Bonn Open Synthesis System 3. Int J Speech Technol 13, 75–84 (2010). https://doi.org/10.1007/s10772-010-9072-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-010-9072-2