Abstract
A synthetic voice personifies the system using it. Previous work has shown that using sub-corpora with different voice qualities (e.g. tense and lax) can be used to modify the perceived personality of a voice as well as adding expressive and emotional functionality. In this work we explore the use of LPC source/filter decomposition together with modification of the residual to artificially add voice quality sub-corpora to a voice without recording bespoke data. We evaluate this artificially enhanced voice against a baseline unit selection voice with pre-recorded sub-corpora. Although artificial modification impacts naturalness, it has the advantage of adding emotional range to voices where none was recorded in the source data, deals with data sparsity issues caused by sub-corpora, and results in significant effects in terms of perceived emotion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Schröder, M.: Emotional speech synthesis: A review. In: Seventh European Conference on Speech Communication and Technology (2001)
Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40(1–2), 189–212 (2003)
Aylett, M.P., Vinciarelli, A., Wester, M.: Speech synthesis for the generation of artificial personality. IEEE Trans. Affect. Comput. (2017)
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)
Aylett, M., Pidcock, C.: Adding and controlling emotion in synthesised speech. Pat no. GB2447263A, September 2008
Gibiansky, A., et al.: Deep voice 2: multi-speaker neural text-to-speech. In: Advances in Neural Information Processing Systems, pp. 2966–2974 (2017)
Nordstrom, K.I., Tzanetakis, G., Driessen, P.F.: Transforming perceived vocal effort and breathiness using adaptive pre-emphasis linear prediction. IEEE Trans. Audio, Speech Lang. Process. 16(6), 1087–1096 (2008)
Huber, S., Roebel, A.: On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system. In: Interspeech 2015 (2015)
Shechtman, A., Shechtman, S., Rendel, A.: Semi parametric concatenative TTS with instant voice modification capabilities. In: Interspeech 2017 (2017)
Drugman, T., Wilfart, G., Dutoit, T.: A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In: Interspeech 2009 (2009)
Erro, D., Iaki Sainz, E.N., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 184–194 (2014)
Csap, T.G., Nmeth, G., Cernak, M., Garner, P.N.: Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder. In: 24th European Signal Processing Conference (EUSIPCO), pp. 184–194 (2016)
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11, 109–118 (1992)
Alku, P.: Glottal inverse filtering analysis of human voice productiona review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5), 623–650 (2011)
Fant, G., Liljencrants, J., Lin, Q.G.: A four-parameter model of glottal flow. STL-QPSR 26(4), 001–013 (1985)
Rosenberg, A.E.: Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am. 49(2B), 583–590 (1971)
Fant, G.: The LF-model revisited. Transformations and frequency domain analysis. Speech Trans. Lab. Q. Rep. R. Inst. Tech. Stockh. 2(3), 40 (1995)
Brookes, M.: VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 13 Oct 2017
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Acknowledgements
This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 645378 (Aria VALUSPA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Buchanan, C.G., Aylett, M.P., Braude, D.A. (2018). Adding Personality to Neutral Speech Synthesis Voices. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)