Interactive Intonation Optimisation Using CMA-ES and DCT Parameterisation of the F0 Contour for Speech Synthesis

Stan, Adriana; Pop, Florin-Claudiu; Cremene, Marcel; Giurgiu, Mircea; Pallez, Denis

doi:10.1007/978-3-642-24094-2_4

Adriana Stan⁷,
Florin-Claudiu Pop⁷,
Marcel Cremene⁷,
Mircea Giurgiu⁷ &
…
Denis Pallez⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 387))

660 Accesses

Abstract

Expressive speech is one of the latest concerns of text-to-speech systems. Due to the subjectivity of expression and emotion realisation in speech, humans cannot objectively determine if one system is more expressive than the other. Most of the text-to-speech systems have a rather flat intonation and do not provide the option of changing the output speech. We therefore present an interactive intonation optimisation method based on the pitch contour parameterisation and evolution strategies. The Discrete Cosine Transform (DCT) is applied to the phrase level pitch contour. Then, the genome is encoded as a vector that contains 7 most significant DCT coefficients. Based on this initial individual, new speech samples are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such as the initial standard deviation, population size, the dynamic expansion of the pitch over the generations and the naturalness and expressivity of the resulted individuals. The results have been evaluated on a Romanian parametric-based speech synthesiser and provide the guidelines for the setup of an interactive optimisation system, in which the users can subjectively select the individual which best suits their expectations with minimum amount of fatigue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D’Este, F., Bakker, E.: Articulatory Speech Synthesis with Parallel Multi-Objective Genetic Algorithms. In: Proc. ASCI (2010)
Google Scholar
Fujisaki, H., Ohno, S.: The use of a generative model of F0 contours for multilingual speech synthesis. In: ICSLP- 1998, pp. 714–717 (1998)
Google Scholar
Fukumoto, M.: Interactive Evolutionary Computation Utilizing Subjective Evaluation and Physiological Information as Evaluation Value. In: Systems Man and Cybernetics, pp. 2874–2879 (2010)
Google Scholar
Hansen, N.: The CMA evolution strategy: A tutorial. Tech. rep., TU Berlin, ETH Zurich (2005)
Google Scholar
Hansen, N., Ostermeier, A.: Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 312–317 (1996)
Google Scholar
Holland, H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Google Scholar
Latorre, J., Akamine, M.: Multilevel Parametric-Base F0 Model for Speech Synthesis. In: Proc. Interspeech (2008)
Google Scholar
Lv, S., Wang, S., Wang, X.: Emotional speech synthesis by XML file using interactive genetic algorithms. In: GEC Summit, pp. 907–910 (2009)
Google Scholar
Marques, V.M., Reis, C., Machado, J.A.T.: Interactive Evolutionary Computation in Music. In: Systems Man and Cybernetics, pp. 3501–3507 (2010)
Google Scholar
McDermott, J., O’Neill, M., Griffith, N.J.L.: Interactive EC control of synthesized timbre. Evolutionary Computation 18, 277–303 (2010)
Article Google Scholar
Moisa, T., Ontanu, D., Dediu, A.-H.: Speech synthesis using neural networks trained by an evolutionary algorithm. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2074, pp. 419–428. Springer, Heidelberg (2001)
Chapter Google Scholar
Panait, L., Luke, S.: A comparison of two competitive fitness functions. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2002, pp. 503–511 (2002)
Google Scholar
Qian, Y., Wu, Z., Soong, F.: Improved Prosody Generation by Maximizing Joint Likelihood of State and Longer Units. In: Proc. ICASSP (2009)
Google Scholar
Sakai, S.: Additive modelling of English F0 contour for Speech Synthesis. In: Proc. ICASSP (2005)
Google Scholar
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A standard for labeling English prosody. In: ICSLP-1992, vol. 2, pp. 867–870 (1992)
Google Scholar
Stan, A., Yamagishi, J., King, S., Aylett, M.: The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Communication 53(3), 442–450 (2011), doi:10.1016/j.specom.2010.12.002
Article Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. on Audio Speech and Language Processing 14(4), 1145–1154 (2006), doi: 10.1109/TASL,876113
Article Google Scholar
Taylor, P.: The tilt intonation model. In: ICSLP 1998, pp. 1383–1386 (1998)
Google Scholar
Teutenberg, J., Wilson, C., Riddle, P.: Modelling and Synthesising F0 Contours with the Discrete Cosine Transform. In: Proc. ICASSP (2008)
Google Scholar
Yamagishi, J., Onishi, K., Masuko, T., Kobayashi, T.: Acoustic modeling of speaking styles and emotional expressions in hmm-based speech synthesis. IEICE - Trans. Inf. Syst. E88-D, 502–509 (2005)
Article Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Tokuda, K.: The HMM-based speech synthesis system (HTS) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Communications Department, Technical University of Cluj-Napoca, Cluj, Romania
Adriana Stan, Florin-Claudiu Pop, Marcel Cremene & Mircea Giurgiu
Laboratoire d’Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S), Université de Nice Sophia-Antipolis, France
Denis Pallez

Authors

Adriana Stan
View author publications
You can also search for this author in PubMed Google Scholar
Florin-Claudiu Pop
View author publications
You can also search for this author in PubMed Google Scholar
Marcel Cremene
View author publications
You can also search for this author in PubMed Google Scholar
Mircea Giurgiu
View author publications
You can also search for this author in PubMed Google Scholar
Denis Pallez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and A.I.E.T.S. Ingenieria Informatica y de Telecomunicación, University of Granada, C/ Periodista Daniel Saucedo Aranda s/n,, 18071, Granada, Spain
David Alejandro Pelta
School of Computer Science, University of Nottingham, Jubilee Campus Wollaton Road, NG8 1BB, Nottingham, UK
Natalio Krasnogor
Center for Cognitive and Neural Studies, Babes-Bolyai University of Cluj Napoca, (Coneural), Str. Ciresilor 29, 400487, Cluj-Napoca, Romania
Dan Dumitrescu
Department of Computer Science, Babes-Bolyai University, Kogalniceanu 1, 400084, Cluj-Napoca, Romania
Camelia Chira
Faculty of Economics and Business Administration, Babes-Bolyai University of Cluj Napoca, Str. Teodor Mihali Nr. 58-60, 400591, Cluj Napoca, Romania
Rodica Lung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stan, A., Pop, FC., Cremene, M., Giurgiu, M., Pallez, D. (2011). Interactive Intonation Optimisation Using CMA-ES and DCT Parameterisation of the F0 Contour for Speech Synthesis. In: Pelta, D.A., Krasnogor, N., Dumitrescu, D., Chira, C., Lung, R. (eds) Nature Inspired Cooperative Strategies for Optimization (NICSO 2011). Studies in Computational Intelligence, vol 387. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24094-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-24094-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24093-5
Online ISBN: 978-3-642-24094-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics