Prosodic Reading Style Simulation for Text-to-Speech Synthesis

Jokisch, Oliver; Kruschke, Hans; Hoffmann, Rüdiger

doi:10.1007/11573548_55

Oliver Jokisch¹⁹,
Hans Kruschke¹⁹ &
Rüdiger Hoffmann¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5012 Accesses
3 Citations

Abstract

The simulation of different reading styles (mainly by adapting prosodic parameters) can improve the naturalness of synthetic speech and supports a more intelligent human machine interaction. The article exemplarily investigates the reading styles News and Tale. For comparison, all examined texts contained the same genre-neutral paragraphs which have been read without a specific style instruction: Normal but also faster, slower, rather monotone or more emotional which led to corresponding artificial styles.

The measured original intonation and durations style patterns control a diphone synthesizer (mapped contours). Additionally, the patterns are used to train a neural network (NN) model.

Within two separate listening tests, different stimuli presented as original signal/style, respectively, with mapped or NN generated prosodic contours have been evaluated. The results show that both, original utterances and artificial styles are basically perceived in their intended reading styles. Some reciprocal confusions indicate the similarities between different styles like News and Fast, Tale and Slow as well as Tale and Expressive. The confusions are more likely for synthetic speech. To produce e. g. the complex style Tale, different features of the prosodic variations Slow and Expressive are combined. The training method for the synthetic styles requires a further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bulut, M., Narayanan, S.S., Syrdal, A.K.: Expressive speech synthesis using a concatenative synthesizer. In: Proc. International Conference on Spoken Language Processing, ICSLP, Denver, USA, pp. 1265–1268 (2002)
Google Scholar
Hoffmann, R., Jokisch, O., Hirschfeld, D., Strecha, G., Kruschke, H., Kordon, U.: A multilingual TTS system with less than 1 mbyte footprint for embedded applications. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Hong Kong, pp. 532–535 (2003)
Google Scholar
Jokisch, O., Ding, H., Kruschke, H.: Towards a multilingual prosody model for Text-to-Speech. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Orlando, USA, pp. 421–424 (2002)
Google Scholar
Kruschke, H.: Advances in the parameter extraction of a command-response intonation model. In: Proc. IEEE International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS, Nashville, USA (2001)
Google Scholar
Laan, G.P.M., van Bergem, D.R.: The contribution of pitch contour, phonem durations and spectral features to the character of spontaneous and read aloud speech. In: Proc. Eurospeech, Berlin, pp. 569–572 (1993)
Google Scholar
Mixdorff, H.: A novel approach to the fully automatic extraction of fujisaki model parameters. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, pp. 1281–1284 (2000)
Google Scholar
Mixdorff, H., Jokisch, O.: Building an integrated prosodic model of German. In: Aalborg, Denmark, pp. 947–950 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Acoustics and Speech Communication, Dresden University of Technology, Dresden, Germany
Oliver Jokisch, Hans Kruschke & Rüdiger Hoffmann

Authors

Oliver Jokisch
View author publications
You can also search for this author in PubMed Google Scholar
Hans Kruschke
View author publications
You can also search for this author in PubMed Google Scholar
Rüdiger Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jokisch, O., Kruschke, H., Hoffmann, R. (2005). Prosodic Reading Style Simulation for Text-to-Speech Synthesis. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_55

Download citation

DOI: https://doi.org/10.1007/11573548_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics