Adding Personality to Neutral Speech Synthesis Voices

Buchanan, Christopher G.; Aylett, Matthew P.; Braude, David A.

doi:10.1007/978-3-319-99579-3_6

Christopher G. Buchanan¹⁶,
Matthew P. Aylett¹⁶ &
David A. Braude¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1575 Accesses
2 Citations

Abstract

A synthetic voice personifies the system using it. Previous work has shown that using sub-corpora with different voice qualities (e.g. tense and lax) can be used to modify the perceived personality of a voice as well as adding expressive and emotional functionality. In this work we explore the use of LPC source/filter decomposition together with modification of the residual to artificially add voice quality sub-corpora to a voice without recording bespoke data. We evaluate this artificially enhanced voice against a baseline unit selection voice with pre-recorded sub-corpora. Although artificial modification impacts naturalness, it has the advantage of adding emotional range to voices where none was recorded in the source data, deals with data sparsity issues caused by sub-corpora, and results in significant effects in terms of perceived emotion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Article 02 February 2019

Synthesising Expressive Speech – Which Synthesiser for VOCAs?

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

Notes

1.
https://github.com/ARIA-VALUSPA/AVP.

References

Schröder, M.: Emotional speech synthesis: A review. In: Seventh European Conference on Speech Communication and Technology (2001)
Google Scholar
Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40(1–2), 189–212 (2003)
Article Google Scholar
Aylett, M.P., Vinciarelli, A., Wester, M.: Speech synthesis for the generation of artificial personality. IEEE Trans. Affect. Comput. (2017)
Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 11(2–3), 175–187 (1992)
Article Google Scholar
Aylett, M., Pidcock, C.: Adding and controlling emotion in synthesised speech. Pat no. GB2447263A, September 2008
Google Scholar
Gibiansky, A., et al.: Deep voice 2: multi-speaker neural text-to-speech. In: Advances in Neural Information Processing Systems, pp. 2966–2974 (2017)
Google Scholar
Nordstrom, K.I., Tzanetakis, G., Driessen, P.F.: Transforming perceived vocal effort and breathiness using adaptive pre-emphasis linear prediction. IEEE Trans. Audio, Speech Lang. Process. 16(6), 1087–1096 (2008)
Article Google Scholar
Huber, S., Roebel, A.: On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system. In: Interspeech 2015 (2015)
Google Scholar
Shechtman, A., Shechtman, S., Rendel, A.: Semi parametric concatenative TTS with instant voice modification capabilities. In: Interspeech 2017 (2017)
Google Scholar
Drugman, T., Wilfart, G., Dutoit, T.: A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In: Interspeech 2009 (2009)
Google Scholar
Erro, D., Iaki Sainz, E.N., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Signal Process. 8(2), 184–194 (2014)
Article Google Scholar
Csap, T.G., Nmeth, G., Cernak, M., Garner, P.N.: Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder. In: 24th European Signal Processing Conference (EUSIPCO), pp. 184–194 (2016)
Google Scholar
Alku, P.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 11, 109–118 (1992)
Article Google Scholar
Alku, P.: Glottal inverse filtering analysis of human voice productiona review of estimation and parameterization methods of the glottal excitation and their applications. Sadhana 36(5), 623–650 (2011)
Article Google Scholar
Fant, G., Liljencrants, J., Lin, Q.G.: A four-parameter model of glottal flow. STL-QPSR 26(4), 001–013 (1985)
Google Scholar
Rosenberg, A.E.: Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am. 49(2B), 583–590 (1971)
Article Google Scholar
Fant, G.: The LF-model revisited. Transformations and frequency domain analysis. Speech Trans. Lab. Q. Rep. R. Inst. Tech. Stockh. 2(3), 40 (1995)
Google Scholar
Brookes, M.: VOICEBOX: speech processing toolbox for MATLAB. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html. Accessed 13 Oct 2017
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Google Scholar

Download references

Acknowledgements

This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 645378 (Aria VALUSPA).

Author information

Authors and Affiliations

CereProc Ltd., CodeBase Floor D, 3 Lady Lawson Street, Edinburgh, EH3 9DR, UK
Christopher G. Buchanan, Matthew P. Aylett & David A. Braude

Authors

Christopher G. Buchanan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew P. Aylett
View author publications
You can also search for this author in PubMed Google Scholar
David A. Braude
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher G. Buchanan .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buchanan, C.G., Aylett, M.P., Braude, D.A. (2018). Adding Personality to Neutral Speech Synthesis Voices. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_6
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adding Personality to Neutral Speech Synthesis Voices

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Synthesising Expressive Speech – Which Synthesiser for VOCAs?

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adding Personality to Neutral Speech Synthesis Voices

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Synthesising Expressive Speech – Which Synthesiser for VOCAs?

Emotional Speech Datasets for English Speech Synthesis Purpose: A Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation