Skip to main content

Harmonic Model for Female Voice Emotional Synthesis

  • Conference paper
Biometric ID Management and Multimodal Communication (BioID 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

Abstract

Spectral and prosodic modifications for emotional speech synthesis using harmonic modelling are described. Autoregressive parameterization of inverse Fourier transformed log spectral envelope is used. Spectral flatness determines the voicing transition frequency dividing spectrum of synthesized speech into minimum phases and random phases of the harmonic model. Female emotional voice conversion is evaluated by a listening test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Navas, E., Hernáez, I., Luengo, I.: An Objective and Subjective Study of the Role of Semantics and Prosodic Features in Building Corpora for Emotional TTS. IEEE Transactions on Audio, Speech, and Language Processing 14, 1117–1127 (2006)

    Article  Google Scholar 

  2. Tao, J., Kang, Y., Li, A.: Prosody Conversion from Neutral Speech to Emotional Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 1145–1154 (2006)

    Article  Google Scholar 

  3. Ververidis, D., Kotropoulos, C.: Emotional Speech Recognition: Resources, Features, and Methods. Speech Communication 48, 1162–1181 (2006)

    Article  Google Scholar 

  4. Tóth, S.L., Sztahó, D., Vicsi, K.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Zainkó, C., Fék, M., Németh, G.: Expressive Speech Synthesis Using Emotion-Specific Speech Inventories. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 225–234. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Kostoulas, T., Ganchev, T., Fakotakis, N.: Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 235–242. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Ringeval, F., Chetouani, M.: Exploiting a Vowel Based Approach for Acted Emotion Recognition. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 243–254. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Callejas, Z., López-Cózar, R.: Influence of Contextual Information in Emotion Annotation for Spoken Dialogue Systems. Speech Communication 50, 416–433 (2008)

    Article  Google Scholar 

  9. McAulay, R.J., Quatieri, T.F.: Low-Rate Speech Coding Based on the Sinusoidal Model. In: Furui, S., Sondhi, M.M. (eds.) Advances in Speech Signal Processing, pp. 165–208. Marcel Dekker, New York (1992)

    Google Scholar 

  10. McAulay, R.J., Quatieri, T.F.: Sinusoidal Coding. In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, pp. 121–173. Elsevier Science, Amsterdam (1995)

    Google Scholar 

  11. Dutoit, T., Gosselin, B.: On the Use of a Hybrid Harmonic/Stochastic Model for TTS Synthesis-by-Concatenation. Speech Communication 19, 119–143 (1996)

    Article  Google Scholar 

  12. Bailly, G.: Accurate Estimation of Sinusoidal Parameters in a Harmonic+Noise Model for Speech Synthesis. In: Eurospeech 1999, Budapest, pp. 1051–1054 (1999)

    Google Scholar 

  13. Stylianou, Y.: Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis. IEEE Transactions on Speech and Audio Processing 9, 21–29 (2001)

    Article  Google Scholar 

  14. Yegnanarayana, B., d’Alessandro, C., Darsinos, V.: An Iterative Algorithm for Decomposition of Speech Signals into Periodic and Aperiodic Components. IEEE Transactions on Speech and Audio Processing 6, 1–11 (1998)

    Article  Google Scholar 

  15. Drioli, C., Tisato, G., Cosi, P., Tesser, F.: Emotions and Voice Quality: Experiments with Sinusoidal Modeling. In: Proceedings of Voice Quality, Geneva, pp. 127–132 (2003)

    Google Scholar 

  16. Ramamohan, S., Dandapat, S.: Sinusoidal Model-Based Analysis and Classification of Stressed Speech. IEEE Transactions on Audio, Speech, and Language Processing 14, 737–746 (2006)

    Article  Google Scholar 

  17. Gray, A.H., Markel, J.D.: A Spectral-Flatness Measure for Studying the Autocorrelation Method of Linear Prediction of Speech Analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP 22, 207–217 (1974)

    Article  Google Scholar 

  18. Vích, R., Vondra, M.: Speech Spectrum Envelope Modeling. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 129–137. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Unser, M.: Splines. A Perfect Fit for Signal and Image Processing. IEEE Signal Processing Magazine 16, 22–38 (1999)

    Article  Google Scholar 

  20. Scherer, K.R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication 40, 227–256 (2003)

    Article  MATH  Google Scholar 

  21. Fant, G.: Acoustical Analysis of Speech. In: Crocker, M.J. (ed.) Encyclopedia of Acoustics, pp. 1589–1598. John Wiley & Sons, Chichester (1997)

    Chapter  Google Scholar 

  22. Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Přibilová, A., Přibil, J. (2009). Harmonic Model for Female Voice Emotional Synthesis. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04391-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04390-1

  • Online ISBN: 978-3-642-04391-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics