Skip to main content

Toward a Rule-Based Synthesis of Vietnamese Emotional Speech

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

Abstract

This paper presents a framework used to simulate four basic emotional styles of Vietnamese speech, by means of acoustic feature transplantation techniques applied to neutral utterances. First, it describes some analyses of acoustic features of Vietnamese emotional speech, accomplished to find the relations between prosodic, voice quality variations and emotional states in Vietnamese speech. Then the target pitch profiles together with duration, energy and spectrum constraints were obtained by applying rules which were inferred from the analysis results and based on the idea that when some emotional speech is synthesized from neutral speech, acoustic features are modified more in some syllables, instead of uniformly modified in all syllables. From there, neutral speech were morphed to produced synthesized speech with emotions. Results of perceptual tests show that emotional styles were well recognized.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wavesurfer, http://www.speech.kth.se/wavesurfer/index.html

  2. Burkhardt, F.: Emofilt: the simulation of emotional speech by prosody-transformation. In: Proc. of Interspeech (2005)

    Google Scholar 

  3. Cahn, J.E.: The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 1–19 (1990)

    Google Scholar 

  4. Edgington, M.: Investigating the limitations of concatenative synthesis. Eurospeech (1997)

    Google Scholar 

  5. Erickson, D.: Expressive speech: Production, perception and application to speech synthesis. Acoust. Sci. & Tech. 26, 317–325 (2005)

    Article  Google Scholar 

  6. Hanson, H.: Glottal characteristics of female speakers: acoustic correlates. J. Acoust. Soc. Am. 101, 466–481 (1997)

    Article  Google Scholar 

  7. Huttar, G.L.: Relations between prosodic variables and emotions in normal american english utterances. Journal of Speech and Hearing Research 11, 481–487

    Google Scholar 

  8. Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech (2007)

    Google Scholar 

  9. Ingram, J., Nguyen, T.: Stress, tone and word prosody in vietnamese compounds. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 193–198 (2006)

    Google Scholar 

  10. Ishii, C.T., Campbell, N.: Analysis of acoustic-prosodic features of spontaneous expressive speech. In: Proceedings of 1st International Congress of Phonetics and Phonology, p. 19 (2002)

    Google Scholar 

  11. Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)

    Article  Google Scholar 

  12. Kent, R.D., Read, C.: Acoustic Analysis of Speech. Singular Publishing Group, San Diego (1992)

    Google Scholar 

  13. Le, H.M., Le, K.H.: Analysis and synthesis for duration feature of vietnamese. In: The 6th National Conference in Information Technology, Thainguyen, Vietnam (2003)

    Google Scholar 

  14. Le, H.M., Quach, T.N.: Some results in phonetic analysis to vietnamese text-to-speech synthesis based on rules. Journal on Information and Communication Technology (2006)

    Google Scholar 

  15. Lê, T.-H., Nguyen, A.-V., Truong, H.V., Van Bui, H., Lê, D.: A study on vietnamese prosody. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 63–73. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Leinonen, L.: Expression of emotional-motivational connotations with a one-word utterance. J. Acoust. Soc. Am. 102, 1853–1863 (1997)

    Article  Google Scholar 

  17. Mac, D.K., Castelli, E., Aubergé, V., Rilliard, A.: How vietnamese attitudes can be recognized and confused: Cross-cultural perception and speech prosody analysis. In: International Conference on Asian Language Processing, pp. 220–223 (2011)

    Google Scholar 

  18. Maekawa, K.: Phonetic and phonological characteristics of paralinguistic information in spoken japanese. In: Proc. Int. Conf. Spoken Language Processing, pp. 635–638 (1998)

    Google Scholar 

  19. Menezes, C., Maekawa, K., Kawahara, H.: Perception of voice quality in pralinguistic information types: A preliminary study. In: Proceedings of the 20th General Meeting of the PSJ, pp. 153–158 (2006)

    Google Scholar 

  20. Pell, M.D.: Influence of emotion and focus location on prosody in matched statements and questions. J. Acoust. Soc. Am. 109, 1668–1680 (2001)

    Article  Google Scholar 

  21. Goto, M., Unoku, M., Saitou, T., Akagi, M.: Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices. In: Proc. WASPAA 2007 (2007)

    Google Scholar 

  22. Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal cues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)

    Article  Google Scholar 

  23. Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal communication of emotion: a review of research paradigms. Speech Communication 40, 227–256 (2003)

    Article  Google Scholar 

  24. Stallo, J.: Simulating emotional speech for a talking head. Honours Thesis, School of Computing, Curtin University of Technology, Australia (2000)

    Google Scholar 

  25. Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. on Audio, Speech and Language Processing 14(2006), 1–19 (2007)

    Google Scholar 

  26. Tran, D.D., Castelli, E., Serignat, J.-F., Le, V.B.: Analysis and modeling of syllable duration for vietnamese speech synthesis. O-COCOSDA (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thi Duyen Ngo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ngo, T.D., Akagi, M., Bui, T.D. (2015). Toward a Rule-Based Synthesis of Vietnamese Emotional Speech. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11680-8_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11679-2

  • Online ISBN: 978-3-319-11680-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics