Spectrum Modification for Emotional Speech Synthesis

Přibilová, Anna; Přibil, Jiří

doi:10.1007/978-3-642-00525-1_23

Anna Přibilová²³ &
Jiří Přibil²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5398))

1185 Accesses
8 Citations

Abstract

Emotional state of a speaker is accompanied by physiological changes affecting respiration, phonation, and articulation. These changes are manifested mainly in prosodic patterns of F0, energy, and duration, but also in segmental parameters of speech spectrum. Therefore, our new emotional speech synthesis method is supplemented with spectrum modification. It comprises non-linear frequency scale transformation of speech spectral envelope, filtering for emphasizing low or high frequency range, and controlling of spectral noise by spectral flatness measure according to knowledge of psychological and phonetic research. The proposed spectral modification is combined with linear modification of F0 mean, F0 range, energy, and duration. Speech resynthesis with applied modification that should represent joy, anger and sadness is evaluated by a listening test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)
Article MATH Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech Emotion Recognition Using Hidden Markov Models. Speech Communication 41, 603–623 (2003)
Article Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional Speech Recognition: Resources, Features, and Methods. Speech Communication 48, 1162–1181 (2006)
Article Google Scholar
Shami, M., Verhelst, W.: An Evaluation of the Robustness of Existing Supervised Machine Learning Approaches to the Classification of Emotions in Speech. Speech Communication 49, 201–212 (2007)
Article Google Scholar
Tóth, S.L., Sztahó, D., Vicsi, K.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)
Chapter Google Scholar
Murray, I.R., Arnott, J.L.: Applying an Analysis of Acted Vocal Emotions to Improve the Simulation of Synthetic Speech. Computer Speech and Language 22, 107–129 (2008)
Article Google Scholar
Bänziger, T., Scherer, K.R.: The Role of Intonation in Emotional Expressions. Speech Communication 46, 252–267 (2005)
Article Google Scholar
Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation, and Gain Matching in Cepstral Speech Synthesis. In: Proceedings of Biosignal, Brno, pp. 77–82 (2000)
Google Scholar
Fant, G.: Acoustical Analysis of Speech. In: Crocker, M.J. (ed.) Encyclopedia of Acoustics, pp. 1589–1598. John Wiley & Sons, Chichester (1997)
Chapter Google Scholar
Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)
Google Scholar
Laroche, J.: Time and Pitch Scale Modification of Audio Signals. In: Kahrs, M., Brandenburg, K. (eds.) Applications of Digital Signal Processing to Audio and Acoustics, pp. 279–309. Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar
Morrison, D., Wang, R., De Silva, L.C.: Ensemble Methods for Spoken Emotion Recognition in Call-Centres. Speech Communication 49, 98–112 (2007)
Article Google Scholar
Dutilleux, P., Zölzer, U.: Filters. In: Zölzer, U. (ed.) DAFX – Digital Audio Effects, pp. 31–62. John Wiley & Sons, Chichester (2002)
Google Scholar
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of Voice Quality, Geneva, pp. 127–132 (2003)
Google Scholar
Hirose, K., Sato, K., Asano, Y., Minematsu, N.: Synthesis of F0 Contours Using Generation Process Model Parameters Predicted form Unlabeled Corpora: Application to Emotional Speech Synthesis. Speech Communication 46, 385–404 (2005)
Article Google Scholar
Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Article Google Scholar
Cabral, J.P., Oliveira, L.C.: EmoVoice: A System to Generate Emotions in Speech. In: Proceedings of Interspeech – ICSLP. pp. 1798–1801. Pittsburgh (2006)
Google Scholar
Přibil, J., Přibilová, A.: Emotional Style Conversion in the TTS System with Cepstral Description. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 65–73. Springer, Heidelberg (2007)
Chapter Google Scholar
Přibilová, A., Přibil, J.: Non-Linear Frequency Scale Mapping for Voice Conversion in Text-to-Speech System with Cepstral Description. Speech Communication 48, 1691–1703 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Radio Electronics, Slovak University of Technology, Ilkovičova 3, SK-812 19, Bratislava, Slovakia
Anna Přibilová
Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, CZ-182 51, Prague, Czech Republic
Jiří Přibil

Authors

Anna Přibilová
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Přibil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Italy and IIASS, Via S. Allende, 84081, Baronissi (SA), Italy
Maria Marinaro
Dip. di Ingegneria dell’ Informazione, Seconda Università di Napoli, Via Roma 29, 81031, Aversa (CE), Italy
Raffaele Martone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibilová, A., Přibil, J. (2009). Spectrum Modification for Emotional Speech Synthesis. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds) Multimodal Signals: Cognitive and Algorithmic Issues. Lecture Notes in Computer Science(), vol 5398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00525-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-00525-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00524-4
Online ISBN: 978-3-642-00525-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics