Harmonic Model for Female Voice Emotional Synthesis

Přibilová, Anna; Přibil, Jiří

doi:10.1007/978-3-642-04391-8_6

Anna Přibilová²⁰ &
Jiří Přibil^21,22

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

European Workshop on Biometrics and Identity Management

1085 Accesses
5 Citations

Abstract

Spectral and prosodic modifications for emotional speech synthesis using harmonic modelling are described. Autoregressive parameterization of inverse Fourier transformed log spectral envelope is used. Spectral flatness determines the voicing transition frequency dividing spectrum of synthesized speech into minimum phases and random phases of the harmonic model. Female emotional voice conversion is evaluated by a listening test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)
Article Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody Conversion from Neutral Speech to Emotional Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 1145–1154 (2006)
Article Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional Speech Recognition: Resources, Features, and Methods. Speech Communication 48, 1162–1181 (2006)
Article Google Scholar
Tóth, S.L., Sztahó, D., Vicsi, K.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)
Chapter Google Scholar
Zainkó, C., Fék, M., Németh, G.: Expressive Speech Synthesis Using Emotion-Specific Speech Inventories. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 225–234. Springer, Heidelberg (2008)
Chapter Google Scholar
Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008)
Chapter Google Scholar
Ringeval, F., Chetouani, M.: Exploiting a Vowel Based Approach for Acted Emotion Recognition. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 243–254. Springer, Heidelberg (2008)
Chapter Google Scholar
Callejas, Z., López-Cózar, R.: Influence of Contextual Information in Emotion Annotation for Spoken Dialogue Systems. Speech Communication 50, 416–433 (2008)
Article Google Scholar
McAulay, R.J., Quatieri, T.F.: Low-Rate Speech Coding Based on the Sinusoidal Model. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech Signal Processing, pp. 165–208. Marcel Dekker, New York (1992)
Google Scholar
McAulay, R.J., Quatieri, T.F.: Sinusoidal Coding. In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, pp. 121–173. Elsevier Science, Amsterdam (1995)
Google Scholar
Dutoit, T., Gosselin, B.: On the Use of a Hybrid Harmonic/Stochastic Model for TTS Synthesis-by-Concatenation. Speech Communication 19, 119–143 (1996)
Article Google Scholar
Bailly, G.: Accurate Estimation of Sinusoidal Parameters in a Harmonic+Noise Model for Speech Synthesis. In: Eurospeech 1999, Budapest, pp. 1051–1054 (1999)
Google Scholar
Stylianou, Y.: Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis. IEEE Transactions on Speech and Audio Processing 9, 21–29 (2001)
Article Google Scholar
Yegnanarayana, B., d’Alessandro, C., Darsinos, V.: An Iterative Algorithm for Decomposition of Speech Signals into Periodic and Aperiodic Components. IEEE Transactions on Speech and Audio Processing 6, 1–11 (1998)
Article Google Scholar
Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of Voice Quality, Geneva, pp. 127–132 (2003)
Google Scholar
Ramamohan, S., Dandapat, S.: Sinusoidal Model-Based Analysis and Classification of Stressed Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 737–746 (2006)
Article Google Scholar
Gray, A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP 22, 207–217 (1974)
Article Google Scholar
Vích, R., Vondra, M.: Speech Spectrum Envelope Modeling. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 129–137. Springer, Heidelberg (2007)
Chapter Google Scholar
Unser, M.: Splines. A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine 16, 22–38 (1999)
Article Google Scholar
Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)
Article MATH Google Scholar
Fant, G.: Acoustical Analysis of Speech. In: Crocker, M.J. (ed.) Encyclopedia of Acoustics, pp. 1589–1598. John Wiley & Sons, Chichester (1997)
Chapter Google Scholar
Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Radio Electronics, Slovak University of Technology, Ikovičova 3, SK, 812 19, Bratislava, Slovakia
Anna Přibilová
Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, CZ, 182 51, Prague 8, Czech Republic
Jiří Přibil
Institute of Measurement Science, Slovak Academy of Sciences, Dúbravská cesta 9, SK, 841 04, Bratislava, Slovakia
Jiří Přibil

Authors

Anna Přibilová
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Přibil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco Tomas y Valiente 11, 28049, Madrid, Spain
Julian Fierrez & Javier Ortega-Garcia &
Second University of Naples, and IIASS, Via Vivaldi 43, 81100, Caserta, Italy
Anna Esposito
EPFL, Speech Processing and Biometrics Group, EPFL-STI-IEL-LIDIAP, ELE 233, Station 11, 1015, Lausanne, Switzerland
Andrzej Drygajlo
Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303, Mataro (Barcelona), Spain
Marcos Faundez-Zanuy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Přibilová, A., Přibil, J. (2009). Harmonic Model for Female Voice Emotional Synthesis. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-04391-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics