Duration Modeling for Emotional Speech

Lai, Wen-Hsing; Wang, Siou-Lin

doi:10.1007/978-3-642-34062-8_13

Duration Modeling for Emotional Speech

Wen-Hsing Lai¹⁹ &
Siou-Lin Wang¹⁹

Conference paper

4753 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7473))

Abstract

Human interaction involves exchanging not only explicit content, but also implicit information about the affective state of the interlocutor. In recent years, researchers attempt to endow the computers or robots with humanity. Various affective computing models have been proposed, which covers the areas of emotion recognition, interpretation, management and generation. Therefore, to analyze and predict the prosodic information of different emotions is very important for the future applications. In this article, a duration modeling approach for emotional speech is presented. Seven kinds of emotion including natural, scare, angry, elation, sadness, surprise, and disgust are adopted. According to the statistics performed on a corpus with seven emotions, a question set considering acoustic and linguistic factors is designed. Experimental results show that the root mean squared errors (RMSEs) of syllable are 0.0725s and 0.0802 s for training and testing sets correspondingly. From the results, the impact of factors related to different emotions can be explored.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wu, C.H., Liang, W.B.: Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels. IEEE Trans. on Affective Computing 2(1) (2011)
Google Scholar
Koolagudi, S.G., Kumar, N., Rao, K.S.: Speech Emotion Recognition Using Segmental Level Prosodic Analysis. In: ICDeCom (2011)
Google Scholar
Luengo, I., Navas, E., Hernáez, I.: Feature Analysis and Evaluation for Automatic Emotion Identification in Speech. IEEE Trans. on Multimedia 12(6) (2010)
Google Scholar
Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion Recognition Using a Hierarchical Binary Decision Tree Approach. Speech Communication 53 (2011)
Google Scholar
Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge. Speech Comm. 53, 9–10 (2011)
Google Scholar
Zeng, H.Z., Tu, J.L., Pianfetti Jr., B., Huang, T.S.: Audio–Visual Affective Expression Recognition Through Multistream Fused HMM. IEEE Trans. on Multimedia 10(4) (2008)
Google Scholar
Slaney, M., McRoberts, G.: BabyEars: A Recognition System for Affective Vocalizations. Speech Communication 39 (2003)
Google Scholar
Iida, A., Campbell, N., Higuchi, F., Yasumura, M.: A Corpus-based Speech Synthesis System with Emotion. Speech Communication 40 (2003)
Google Scholar
Schröder, M.: Expressing Degree of Activation in Synthetic Speech. IEEE Trans. on Audio, Speech, and Language Processing 14(4) (2006)
Google Scholar
Murray, I.R., Amott, J.L.: Synthesizing Emotions in Speech: Is It Time to Get Excited. In: Fourth International Conference on Spoken Language, vol. 3 (1996)
Google Scholar
A1-Dakkak, O., Ghneim, N., Abou Zliekha, M., Al-Moubayed, S.: Prosodic Feature Introduction and Emotion Incorporation in an Arabic TTS. In: 2nd Information and Communication Technologies (2006)
Google Scholar
Jiang, D.N., Zhang, W., Shen, L.Q., Cai, L.H.: Prosody Analysis and Modeling for Emotional Speech Synthesis. In: ICASSP (2005)
Google Scholar
Vidya Sagar, T., Sreenivasa Rao, K., Prasanna, S.R.M., Dandapat, S.: Characterization and Incorporation of Emotions in Speech. In: IEEE INDICON (2006)
Google Scholar
Strongman, K.T.: The Psychology of Emotion - Theories of Emotion in Perspective. Wu-Nan Book Inc., Taipei (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer and Communication Engineering, National Kaohsiung First University of Science and Technology, No.2, Jhuoyue Rd., 81164, Kaohsiung, Taiwan
Wen-Hsing Lai & Siou-Lin Wang

Authors

Wen-Hsing Lai
View author publications
You can also search for this author in PubMed Google Scholar
Siou-Lin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Science, Hebei United University, 063000, Tangshan, Hebei, China
Baoxiang Liu
Nanyang Technological University, Singapore
Maode Ma
College of Science, Hebei United University, 063009, Tangshan, Hebei, China
Jincai Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, WH., Wang, SL. (2012). Duration Modeling for Emotional Speech. In: Liu, B., Ma, M., Chang, J. (eds) Information Computing and Applications. ICICA 2012. Lecture Notes in Computer Science, vol 7473. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34062-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-34062-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34061-1
Online ISBN: 978-3-642-34062-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics