Word-Level Emotion Recognition Using High-Level Features

Moore, Johanna D.; Tian, Leimin; Lai, Catherine

doi:10.1007/978-3-642-54903-8_2

Johanna D. Moore¹⁷,
Leimin Tian¹⁷ &
Catherine Lai¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1779 Accesses
12 Citations

Abstract

In this paper, we investigate the use of high-level features for recognizing human emotions at the word-level in natural conversations with virtual agents. Experiments were carried out on the 2012 Audio/Visual Emotion Challenge (AVEC2012) database, where emotions are defined as vectors in the Arousal-Expectancy-Power-Valence emotional space. Our model using 6 novel disfluency features yields significant improvements compared to those using large number of low-level spectral and prosodic features, and the overall performance difference between it and the best model of the AVEC2012 Word-Level Sub-Challenge is not significant. Our visual model using the Active Shape Model visual features also yields significant improvements compared to models using the low-level Local Binary Patterns visual features. We built a bimodal model By combining our disfluency and visual feature sets and applying Correlation-based Feature-subset Selection. Considering overall performance on all emotion dimensions, our bimodal model outperforms the second best model of the challenge, and comes close to the best model. It also gives the best result when predicting Expectancy values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D’Mello, S., Jackson, T., Craig, S., Morgan, B., Chipman, P., White, H., Person, N., Kort, B., el Kaliouby, R., Picard, R., et al.: AutoTutor detects and responds to learners affective and cognitive states. In: Workshop on Emotional and Cognitive Issues at the International Conference on Intelligent Tutoring Systems (2008)
Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)
Google Scholar
Schuller, B., Valster, M., Eyben, F., Cowie, R., Pantic, M.: AVEC 2012: the continuous audio/visual emotion challenge. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 449–456. ACM (2012)
Google Scholar
Nicolle, J., Rapp, V., Bailly, K., Prevost, L., Chetouani, M.: Robust continuous prediction of human emotions using multiscale dynamic cues. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 501–508. ACM (2012)
Google Scholar
Savran, A., Cao, H., Shah, M., Nenkova, A., Verma, R.: Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 485–492. ACM (2012)
Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 971–987 (2002)
Article Google Scholar
Scherer, K.R.: Expression of emotion in voice and music. Journal of Voice 9, 235–248 (1995)
Article Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407–422 (2005)
Article Google Scholar
Soladié, C., Salam, H., Pelachaud, C., Stoiber, N., Séguier, R.: A multimodal fuzzy inference system using a continuous facial expression representation for emotion detection. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 493–500. ACM (2012)
Google Scholar
Silva, L.C., Miyasato, T.: Degree of human perception of facial emotions based on audio and video information. IEICE Technical Report. Image Engineering 96, 9–15 (1996)
Google Scholar
Chen, L.S., Huang, T.S., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 366–371. IEEE (1998)
Google Scholar
McKeown, G., Valstar, M.F., Cowie, R., Pantic, M.: The SEMAINE corpus of emotionally coloured character interactions. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1079–1084. IEEE (2010)
Google Scholar
Fontaine, J.R., Scherer, K.R., Roesch, E.B., Ellsworth, P.C.: The world of emotions is not two-dimensional. Psychological Science 18, 1050–1057 (2007)
Article Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–511. IEEE (2001)
Google Scholar
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Computer vision and image understanding 61, 38–59 (1995)
Article Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Comparing active shape models with active appearance models. BMVC 99, 173–182 (1999)
Google Scholar
Milborrow, S., Morkel, J., Nicolls, F.: The MUCT landmarked face database. Pattern Recognition Association of South Africa 201 (2010)
Google Scholar
Milborrow, S., Nicolls, F.: Active shape models with sift descriptors and mars 1, 5 (2014)
Google Scholar
Milborrow, S.: Stasm User Manual (2013), http://www.milbo.users.sonic.net/stasm
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1998)
Google Scholar
Drucker, H., Burges, C.J., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. Advances in Neural Information Processing Systems, 155–161 (1997)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27 (2011)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11, 10–18 (2009)
Article Google Scholar
Ozkan, D., Scherer, S., Morency, L.P.: Step-wise emotion recognition using concatenated-HMM. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 477–484. ACM (2012)
Google Scholar
van der Maaten, L.: Audio-visual emotion challenge 2012: a simple approach. In: Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 473–476. ACM (2012)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Transactions on Audio, Speech, and Language Processing 14, 1526–1540 (2006)
Article Google Scholar
Niewiadomski, R., Hofmann, J., Urbain, J., Platt, T., Wagner, J., Piot, B., Cakmak, H., Pammi, S., Baur, T., Dupont, S., et al.: et al.: Laugh-aware virtual agent and its impact on user amusement. In: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, pp. 619–626. International Foundation for Autonomous Agents and Multiagent Systems (2013)
Google Scholar
Lai, C., Carletta, J., Renals, S.: Detecting summarization hot spots in meetings using group level involvement and turn-taking features. In: Proceedings of Interspeech 2013, Lyon, France (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Informatics Forum, EH8 9AB, Edinburgh, UK
Johanna D. Moore, Leimin Tian & Catherine Lai

Authors

Johanna D. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Leimin Tian
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Lai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moore, J.D., Tian, L., Lai, C. (2014). Word-Level Emotion Recognition Using High-Level Features. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Word-Level Emotion Recognition Using High-Level Features