Fusion of Fragmentary Classifier Decisions for Affective State Recognition

Krell, Gerald; Glodek, Michael; Panning, Axel; Siegert, Ingo; Michaelis, Bernd; Wendemuth, Andreas; Schwenker, Friedhelm

doi:10.1007/978-3-642-37081-6_13

Gerald Krell²¹,
Michael Glodek²²,
Axel Panning²¹,
Ingo Siegert²¹,
Bernd Michaelis²¹,
Andreas Wendemuth²¹ &
…
Friedhelm Schwenker²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7742))

Included in the following conference series:

IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction

841 Accesses

Abstract

Real human-computer interaction systems based on different modalities face the problem that not all information channels are always available at regular time steps. Nevertheless an estimation of the current user state is required at anytime to enable the system to interact instantaneously based on the available modalities. A novel approach to decision fusion of fragmentary classifications is therefore proposed and empirically evaluated for audio and video signals of a corpus of non-acted user behavior. It is shown that visual and prosodic analysis successfully complement each other leading to an outstanding performance of the fusion architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multimodal Recognition of Emotions Using Physiological Signals with the Method of Decision-Level Fusion for Healthcare Applications

Fusing facial and speech cues for enhanced multimodal emotion recognition

Article 24 January 2024

Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

Article 05 September 2023

References

Bartlett, M., Littlewort, G., Vural, E., Lee, K., Cetin, M., Ercil, A., Movellan, J.: Data Mining Spontaneous Facial Behavior with Automatic Expression Coding. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 1–20. Springer, Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-70872-8_1
Chapter Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit - Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language 25(1), 4–28 (2011)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. In: Jordan, M., Kleinberg, J., Schölkopf, B. (eds.) Pattern Recognition and Machine Learning, Springer (2006)
Google Scholar
Cowie, R., Cornelius, R.R.: Describing the Emotional States that are Expressed in Speech. J. on Speech Commun. 40(1-2), 5–32 (2003)
Article MATH Google Scholar
Diebel, J., Thrun, S.: An Application of Markov Random Fields to Range Sensing. In: Proc. of Advances in Neural Information Processing Systems (NIPS), vol. 18, pp. 291–298. MIT Press (2006)
Google Scholar
Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologist Press, Palo Alto (1978)
Google Scholar
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various mfcc implementations on the speaker verification task. In: Proc. of the SPECOM 2005, pp. 191–194 (2005), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.8303
Glodek, M., Schels, M., Palm, G., Schwenker, F.: Multi-modal fusion based on classification using rejection option and markov fusion network. In: Proceedings of the International Conference on Pattern Recognition (ICPR). IEEE (to appear 2012)
Google Scholar
Greenberg, S., Ainsworth, W.A., Popper, A.N., Fay, R.R., Mogran, N., Bourlard, H., Hermansky, H.: Automatic speech recognition: An auditory perspective. In: Speech Processing in the Auditory System, Springer Handbook of Auditory Research, vol. 18, pp. 309–338. Springer, New York (2004), http://dx.doi.org/10.1007/0-387-21575-1_6 , doi:10.1007/0-387-21575-16
Google Scholar
Kanluan, I., Grimm, M., Kroschel, K.: Audio-visual Emotion Recognition using an Emotion Space Concept. In: Proceedings of the European Signal Processing Conference (EUSIPCO), Lausanne (2008)
Google Scholar
Kelley, J.F.: An empirical methodology for writing user-friendly natural language computer applications. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1983, pp. 193–196. ACM, New York (1983), http://doi.acm.org/10.1145/800045.801609
Google Scholar
Lausberg, H., Kryger, M.: Gestisches Verhalten als Indikator therapeutischer Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Repräsentation von Objektbeziehungen in gestischen Darstellungen. Psychotherapie-Wissenschaft 1(1) (2011), http://www.psychotherapie-wissenschaft.info/index.php/psy-wis/article/view/12
Mahmoud, M., Robinson, P.: Interpreting Hand-Over-Face Gestures. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 248–255. Springer, Heidelberg (2011), http://dl.acm.org/citation.cfm?id=2062850.2062879
Chapter Google Scholar
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual Emotion Recognition using Gaussian Mixture Models for Face and Voice. In: Proc. of the IEEE Int. Symposium on Multimedia, Berkeley, CA, pp. 250–257 (December 2008)
Google Scholar
Niese, R., Al-Hamadi, A., Panning, A., Brammen, D., Ebmeyer, U., Michaelis, B.: Towards pain recognition in Post-Operative phases using 3D-based features from video and support vector machines. International Journal of Digital Content Technology and its Applications (2009), http://www.aicit.org/JDCTA/paper_detail.html?q=92
Paleari, M., Huet, B., Chellali, R.: Towards Multimodal Emotion Recognition: A new Approach. In: Proceedings of the ACM International Conference on Image and Video Retrieval, Xi’an, China, July 5-7 (2010)
Google Scholar
Palm, G., Glodek, M.: Towards emotion recognition in human computer interaction. In: Proceedings of the Italian Workshop on Neural Networks WIRN (to appear, 2012)
Google Scholar
Panning, A., Al-Hamadi, A., Michaelis, B.: Active Shape Models on Adaptively Refined Mouth Emphasizing Color Images. In: WSCG Communication Papers, pp. 221–228 (2010)
Google Scholar
Panning, A., Siegert, I., Al-Hamadi, A., Wendemuth, A., Rösner, D., Frommer, J., Krell, G., Michaelis, B.: Multimodal Affect Recognition in Spontaneous HCI Environment. In: IEEE International Conference on Signal Processing, Communications and Computings, ICPCC 2012 (to appear, 2012)
Google Scholar
Rösner, D., Frommer, J., Friesen, R., Haase, M., Lange, J., Otto, M.: LAST MINUTE: a Multimodal Corpus of Speech-based User-Companion Interactions. In: Calzolari (Conference Chair), N., Choukri, K., Declerck, T., Dogan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proc. of the Eighth International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA) (May 2012)
Google Scholar
Saeed, A., Niese, R., Al-Hamadi, A., Panning, A.: Hand-face-touch Measure: a Cue for Human Behavior Analysis. In: IEEE Int. Conf. on Intelligent Computing and Intelligent Systems, vol. 3, pp. 605–609 (2011)
Google Scholar
Saeed, A., Niese, R., Al-Hamadi, A., Panning, A.: Hand-face-touch measure: a cue for human behavior analysis. In: 2011 IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), vol. 3, pp. 605–609 (2011)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: A benchmark comparison of performances. In: Proc. of IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), Merano, Italy, pp. 552–557 (December 2009)
Google Scholar
Schuller, B., Valstar, M.F., Eyben, F., McKeown, G., Cowie, R., Pantic, M.: AVEC 2011–The First International Audio/Visual Emotion Challenge. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 415–424. Springer, Heidelberg (2011)
Chapter Google Scholar
Siegert, I., Böck, R., Philippou-Hübner, D., Vlasenko, B., Wendemuth, A.: Appropriate Emotional Labeling of Non-acted Speech Using Basic Emotions, Geneva Emotion Wheel and Self Assessment Manikins. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME 2011, Barcelona, Spain (2011)
Google Scholar
Soleymani, M., Pantic, M., Pun, T.: Multi-Modal Emotion Recognition in Response to Videos. IEEE Transactions on Affective Computing 99, Preprints (November 2011) (in press)
Google Scholar
Vural, E., Çetin, M., Erçil, A., Littlewort, G., Bartlett, M., Movellan, J.: Machine learning systems for detecting driver drowsiness. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds.) In-Vehicle Corpus and Signal Processing for Driver Behavior, pp. 97–110. Springer, US (2009), http://dx.doi.org/10.1007/978-0-387-79582-9_8
Chapter Google Scholar
Wagner, J., Lingenfelser, F., André, E., Kim, J.: Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data. IEEE Trans. on Affective Computing 99, Preprints (2011)
Google Scholar
Wendemuth, A., Biundo, S.: A Companion Technology for Cognitive Technical Systems. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) COST 2102. LNCS, vol. 7403, pp. 89–103. Springer, Heidelberg (2012)
Chapter Google Scholar
Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain-computer interfaces for communication and control.. Clinical Neurophysiology 113(6), 767–791 (2002), http://view.ncbi.nlm.nih.gov/pubmed/12048038
Article Google Scholar
Wu, H.-Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W.T.: Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph (Proceedings SIGGRAPH 2012) 31(4) (2012)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge, UK (2006), http://nesl.ee.ucla.edu/projects/ibadge/docs/ASR/htk/htkbook.pdf
Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(1), 39–58 (2009)
Article Google Scholar
Zeng, Z., Tu, J., Pianfetti, B., Huang, T.: Audio-visual Affective Expression Recognition through Multi-stream Fused HMM. IEEE Trans. on Multimedia 4, 570–577 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Electronics, Signal Processing and Communications, Otto-von-Guericke University Magdeburg, Germany
Gerald Krell, Axel Panning, Ingo Siegert, Bernd Michaelis & Andreas Wendemuth
Institute of Neural Information Processing, Ulm University, Germany
Michael Glodek & Friedhelm Schwenker

Authors

Gerald Krell
View author publications
You can also search for this author in PubMed Google Scholar
Michael Glodek
View author publications
You can also search for this author in PubMed Google Scholar
Axel Panning
View author publications
You can also search for this author in PubMed Google Scholar
Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Michaelis
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Wendemuth
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Friedhelm Schwenker
Institute for Creative Technologies, Multimodal Communication and Computation Laboratory, University of Southern California, 12015 Waterfront Drive, 90094, Playa Vista, CA, USA
Stefan Scherer & Louis-Philippe Morency &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krell, G. et al. (2013). Fusion of Fragmentary Classifier Decisions for Affective State Recognition. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2012. Lecture Notes in Computer Science(), vol 7742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37081-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-37081-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37080-9
Online ISBN: 978-3-642-37081-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics