research-article

Emotion Recognition from Audio and Visual Data using F-score based Fusion

Authors:

Arnab BhattacharyaAuthors Info & Claims

CODS '14: Proceedings of the 1st IKDD Conference on Data Sciences

Pages 1 - 10

https://doi.org/10.1145/2567688.2567690

Published: 21 March 2014 Publication History

Abstract

Emotion recognition has been one of the cornerstones of human-computer interaction. Although decades of work has attacked the problem of automatic emotion recognition from either audio or video signals, the fusion of the two modalities is more recent. In this paper, we aim to tackle the problem when both audio and video data are available in a synchronized manner. We address the six basic human emotions, namely, anger, disgust, fear, happiness, sadness, and surprise. We employ an automatic face tracker to extract the different facial points of interest from a video. We then compute feature vectors for each video frame using distances and angles between the tracked points. For audio data, we use the pitch, energy and MFCC to derive feature vectors for each window as well as the entire audio signal. We use two standard techniques, GMM-based HMM and SVM, as the base classifiers. We then design a novel fusion method using the F-score of the base classifiers. We first demonstrate that our fusion approach can increase the accuracy of the base classifiers by as much as 5%. Finally, we show that our fusion-based bi-modal emotion recognition method achieves an overall accuracy of 54% on a publicly available database, which is an improvement upon the current state-of-the-art by 9%.

References

[1]

M. S. Bartlett, G. Littlewort, I. Fasel, and J. R. Movellan. Real time face detection and facial expression recognition: Development and applications to human computer interaction. In CVPRW, pages 53--53, 2003.

[2]

A. Batliner, S. Steidl, and E. Nöth. Releasing a thoroughly annotated and processed spontaneous emotional database. In Proc. of a Satellite Workshop of LREC, 2008.

[3]

F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss. A database of German emotional speech. In Proc. Interspeech, 2005.

[4]

C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Trans. on Intelligent Systems and Technology (TIST), 2(3):27, 2011.

Digital Library

[5]

S. W. Chew, P. Lucey, S. Lucey, J. Saragih, J. F. Cohn, and S. Sridharan. Person-independent facial expression detection using constrained local models. In Automatic Face & Gesture Recognition and Workshops (FG), pages 915--920, 2011.

[6]

T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. Pattern Analysis and Machine Intelligence, 23(6):681--685, 2001.

Digital Library

[7]

C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995.

[8]

D. Datcu and L. Rothkrantz. Semantic audio-visual data fusion for automatic emotion recognition. Euromedia, 2008.

[9]

L. Devillers and L. Vidrascu. Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Proc. of Interspeech, pages 801--804, 2006.

[10]

I. Dinov. Expectation maximization and mixture modeling tutorial. Statistics Online Computational Resource, Paper EM_MM, 2008.

[11]

I. S. Engberg and A. V. Hansen. Documentation of the Danish emotional speech database DES. Technical report, Center for Person Kommunikation, Denmark, 1996.

[12]

B. Fasel and J. Luettin. Automatic facial expression analysis: a survey. Pattern Recognition, 36(1):259--275, 2003.

[13]

H.-J. Go, K.-C. Kwak, D.-J. Lee, and M.-G. Chun. Emotion recognition from the facial image and speech signal. In SICE, pages 2890--2895, 2003.

[14]

K. Goebel and W. Yan. Choosing classifiers for decision fusion. In Proc. Seventh Int. Conf. on Information Fusion, volume 1, pages 563--568, 2004.

[15]

S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll. Bimodal fusion of emotional data in an automotive environment. In Acoustics, Speech, and Signal Processing (ICASSP), pages ii--1085, 2005.

[16]

M. Kamachi, M. Lyons, and J. Gyoba. The Japanese female facial expression (JAFFE) database. URL: http://www.kasrl.org/jaffe.html, 1998.

[17]

L. I. Kuncheva, C. J. Whitaker, and Shipp. Limits on the majority vote accuracy in classifier fusion. Pattern Analysis & Applications, 2003.

[18]

O.-W. Kwon, K. Chan, J. Hao, and T.-W. Lee. Emotion recognition by speech signals. In Eighth European Conf. on Speech Communication and Technology, 2003.

[19]

L. Lam and S. Y. Suen. Application of majority voting to pattern recognition: An analysis of its behavior and performance. Systems, Man and Cybernetics, Part A, 27(5):553--568, 1997.

Digital Library

[20]

O. Lartillot, P. Toiviainen, and T. Eerola. Mirtoolbox. Retrieved March, 20:2011, 2007.

[21]

Y.-L. Lin and G. Wei. Speech emotion recognition based on hmm and svm. In Machine Learning and Cybernetics, pages 4898--4901, 2005.

[22]

G. Littlewort, M. S. Bartlett, I. Fasel, J. Chenu, and J. R. Movellan. Analysis of machine learning methods for real-time recognition of facial expressions from video. In CVPR, 2004.

[23]

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPRW, pages 94--101, 2010.

[24]

S. Lucey, A. B. Ashraf, and J. Cohn. Investigating spontaneous facial action recognition through aam representations of the face. Face Recognition, pages 275--286, 2007.

[25]

M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba. Coding facial expressions with gabor wavelets. In Automatic Face and Gesture Recognition, pages 200--205, 1998.

Digital Library

[26]

M. J. Lyons, J. Budynek, A. Plante, and S. Akamatsu. Classifying facial attributes using a 2-D Gabor wavelet representation and discriminant analysis. In Automatic Face and Gesture Recognition, 2000, pages 202--207, 2000.

Digital Library

[27]

O. Martin, I. Kotsia, B. Macq, and I. Pitas. The eNTERFACE audio-visual emotion database. In Data Engineering Workshops, page 8, 2006.

Digital Library

[28]

P. Mermelstein. Distance measures for speech recognition, psychological and instrumental. In Pattern Recognition and Artificial Intelligence, pages 374--388, 1976.

[29]

P. Michel and R. El Kaliouby. Real time facial expression recognition in video using support vector machines. In Proc. 5th Int. Conf. on Multimodal Interfaces, pages 258--264, 2003.

Digital Library

[30]

K. Murphy. Hidden markov model (HMM) toolbox for Matlab. URL: http://www.cs.ubc.ca/murphyk/Software/HMM/hmm.html, 2005.

[31]

M. Paleari and B. Huet. Toward emotion indexing of multimedia excerpts. In CBMI, pages 425--432, 2008.

[32]

M. Paleari, B. Huet, and R. Chellali. Towards multimodal emotion recognition: A new approach. In CIVR, 2010.

Digital Library

[33]

M. Pantic and I. Patras. Detecting facial actions and their temporal segments in nearly frontal-view face image sequences. In Systems, Man and Cybernetics, pages 3358--3363, 2005.

[34]

M. Pantic and L. J. Rothkrantz. Facial action recognition for facial expression analysis from static face images. Systems, Man, and Cybernetics, 34(3):1449--1461, 2004.

Digital Library

[35]

M. Petrakos, I. Kannelopoulos, J. A. Benediktsson, and M. Pesaresi. The effect of correlation on the accuracy of the combined classifier in decision level fusion. In Geoscience and Remote Sensing Symposium (IGARSS), pages 2623--2625, 2000.

[36]

L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of the IEEE, 77(2):257--286, 1989.

[37]

B. Schuller, R. Müller, M. Lang, and G. Rigoll. Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In Proc. Interspeech, pages 805--808, 2005.

[38]

B. Schuller, S. Reiter, R. Muller, M. Al-Hames, M. Lang, and G. Rigoll. Speaker independent speech emotion recognition by ensemble classification. In Multimedia and Expo (ICME), pages 864--867, 2005.

[39]

Y. Tian, T. Kanade, and J. F. Cohn. Recognizing action units for facial expression analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(2):97--115, 2001.

Digital Library

[40]

Y. Tian, T. Kanade, and J. F. Cohn. Evaluation of gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity. In Automatic Face and Gesture Recognition, pages 229--234, 2002.

Digital Library

[41]

K. Venkataramani and B. V. K. V. Kumar. Designing classifiers for fusion-based biometric verification. In Biometrics: Theory, Methods and Applications, chapter 4. John Wiley & Sons, 2009.

[42]

P. Viola and M. J. Jones. Robust real-time face detection. Int. J. of Computer Vision, 57(2):137--154, 2004.

Digital Library

[43]

Y. Wang and L. Guan. Recognizing human emotion from audiovisual information. In Acoustics, Speech, and Signal Processing (ICASSP), pages ii--1125, 2005.

[44]

Z. Zeng, Y. Hu, G. Roisman, Z. Wen, Y. Fu, and T. Huang. Audio-visual spontaneous emotion recognition. Artifical Intelligence for Human Computing, pages 72--90, 2007.

Digital Library

[45]

Z. Zeng, J. Tu, M. Liu, T. S. Huang, B. Pianfetti, D. Roth, and S. Levinson. Audio-visual affect recognition. Multimedia, 9(2):424--428, 2007.

Digital Library

[46]

Z. Zeng, J. Tu, B. Pianfetti, M. Liu, T. Zhang, Z. Zhang, T. S. Huang, and S. Levinson. Audio-visual affect recognition through multi-stream fused HMM for HCI. In CVPR, pages 967--972, 2005.

Digital Library

Cited By

Tiwari PRathod HThakkar SDarji A(2021)Multimodal emotion recognition using SDA-LDA algorithm in video clipsJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-021-03529-714:6(6585-6602)Online publication date: 4-Oct-2021
https://doi.org/10.1007/s12652-021-03529-7
Li SLi SWang YZhang X(2020)Research on the Phonetic Emotion Recognition Model of Mandarin Chinese2020 International Conference on Culture-oriented Science & Technology (ICCST)10.1109/ICCST50977.2020.00124(602-606)Online publication date: Oct-2020
https://doi.org/10.1109/ICCST50977.2020.00124
Kose UDeperlioglu OAlzubi JPatrut BKose UDeperlioglu OAlzubi JPatrut B(2020)Psychological Personal Support System with Long Short Term Memory and Facial Expressions Recognition ApproachDeep Learning for Medical Decision Support Systems10.1007/978-981-15-6325-6_8(129-144)Online publication date: 18-Jun-2020
https://doi.org/10.1007/978-981-15-6325-6_8
Show More Cited By

Index Terms

Emotion Recognition from Audio and Visual Data using F-score based Fusion
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Application of Emotion Recognition and Modification for Emotional Telugu Speech Recognition
Abstract
Majority of the automatic speech recognition systems (ASR) are trained with neutral speech and the performance of these systems are affected due to the presence of emotional content in the speech. The recognition of these emotions in human speech ...
Audio-visual spontaneous emotion recognition
ICMI'06/IJCAI'07: Proceedings of the ICMI 2006 and IJCAI 2007 international conference on Artifical intelligence for human computing

Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting--the Adult Attachment Interview (...
Audio-visual emotion recognition in adult attachment interview
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Automatic multimodal recognition of spontaneous affective expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting - Adult Attachment Interview (AAI)...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS '14: Proceedings of the 1st IKDD Conference on Data Sciences

March 2014

73 pages

ISBN:9781450324755

DOI:10.1145/2567688

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CoDS '14

CoDS '14: 1st IKDD Conference on Data Sciences

March 21 - 23, 2014

Delhi, India

Acceptance Rates

CODS '14 Paper Acceptance Rate 7 of 57 submissions, 12%;

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
270
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tiwari PRathod HThakkar SDarji A(2021)Multimodal emotion recognition using SDA-LDA algorithm in video clipsJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-021-03529-714:6(6585-6602)Online publication date: 4-Oct-2021
https://doi.org/10.1007/s12652-021-03529-7
Li SLi SWang YZhang X(2020)Research on the Phonetic Emotion Recognition Model of Mandarin Chinese2020 International Conference on Culture-oriented Science & Technology (ICCST)10.1109/ICCST50977.2020.00124(602-606)Online publication date: Oct-2020
https://doi.org/10.1109/ICCST50977.2020.00124
Kose UDeperlioglu OAlzubi JPatrut BKose UDeperlioglu OAlzubi JPatrut B(2020)Psychological Personal Support System with Long Short Term Memory and Facial Expressions Recognition ApproachDeep Learning for Medical Decision Support Systems10.1007/978-981-15-6325-6_8(129-144)Online publication date: 18-Jun-2020
https://doi.org/10.1007/978-981-15-6325-6_8
Rahdari FRashedi EEftekhari M(2018)A Multimodal Emotion Recognition System Using Facial Landmark AnalysisIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-018-0142-943:S1(171-189)Online publication date: 22-Oct-2018
https://doi.org/10.1007/s40998-018-0142-9
Noroozi FMarjanovic MNjegus AEscalera SAnbarjafari G(2016)Fusion of classifier predictions for audio-visual emotion recognition2016 23rd International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2016.7899608(61-66)Online publication date: Dec-2016
https://doi.org/10.1109/ICPR.2016.7899608

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten