research-article

Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning

Authors:

Hong FuAuthors Info & Claims

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Pages 508 - 513

https://doi.org/10.1145/2663204.2666277

Published: 12 November 2014 Publication History

Abstract

This paper presents our proposed approach for the second Emotion Recognition in The Wild Challenge. We propose a new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG_TOP) to represent facial expressions. We also explore the properties of visual features and audio features, and adopt Multiple Kernel Learning (MKL) to find an optimal feature fusion. An SVM with multiple kernels is trained for the facial expression classification. Experimental results demonstrate that our method achieves a promising performance. The overall classification accuracy on the validation set and test set are 40.21% and 45.21%, respectively.

References

[1]

M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding Facial Expressions with Gabor Wavelets," in Automatic Face and Gesture Recognition, Proceedings. Third IEEE International Conference on, 1998, pp. 200--205.

Digital Library

[2]

T. Sim, S. Baker, and M. Bsat, "The Carnegie Mellon University pose, illumination, and expression database," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1615--1618, 2003.

Digital Library

[3]

M. Pantic, M. Valstar, R. Rademaker, and L. Maat, "Web-based database for facial expression analysis," in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, 2005.

[4]

R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, "Multi-pie," Image and Vision Computing, vol. 28, pp. 807--813, 2010.

Digital Library

[5]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "Static Facial Expression Analysis In Tough Conditions," in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011, pp. 2106--2112.

[6]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "A semi-automatic method for collecting richly labelled large facial expression databases from movies," IEEE Multimedia, 2012.

Digital Library

[7]

A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon, "Emotion Recognition In The Wild Challenge 2013," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 509--516.

Digital Library

[8]

A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, "SimpleMKL," Journal of Machine Learning Research, vol. 9, pp. 2491--2521, 2008.

[9]

S. Z. Li and A. K. Jain, Handbook of face recognition: springer, 2011.

Digital Library

[10]

T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 681--685, 2001.

Digital Library

[11]

I. Matthews and S. Baker, "Active appearance models revisited," International Journal of Computer Vision, vol. 60, pp. 135--164, 2004.

Digital Library

[12]

D. Cristinacce and T. F. Cootes, "Feature detection and tracking with constrained local models," in BMVC, 2006, pp. 929--938.

[13]

X. Zhu and D. Ramanan, "Face detection, pose estimation and landmark localization in the wild," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2879--2886.

Digital Library

[14]

H. G. Feichtinger and T. Strohmer, Gabor analysis and algorithms: Theory and applications: Springer, 1998.

Digital Library

[15]

T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971--987, 2002.

Digital Library

[16]

V. Ojansivu and J. Heikkilä, "Blur insensitive texture classification using local phase quantization," in Image and Signal Processing, 2008, pp. 236--243.

Digital Library

[17]

G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, pp. 915--928, 2007.

Digital Library

[18]

J. Päivärinta, E. Rahtu, and J. Heikkilä, "Volume local phase quantization for blur-insensitive dynamic texture classification," in Proceedings of the 17th Scandinavian conference on Image analysis, 2011, pp. 360--369.

Digital Library

[19]

A. Dhall, A. Asthana, R. Goecke, and T. Gedeon, "Emotion recognition using PHOG and LPQ features," in Automatic Face & Gesture Recognition and Workshops IEEE International Conference on, 2011, pp. 878--883.

[20]

Y. Kim, H. Lee, and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 3687--3691.

[21]

M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen, "Partial least squares regression on grassmannian manifold for emotion recognition," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 525--530.

Digital Library

[22]

K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett, "Multiple kernel learning for emotion recognition in the wild," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 517--524.

Digital Library

[23]

S. E. Kanou, C. Pal, X. Bouthillier, P. Froumenty, Ç. Gülçehre, R. Memisevic, et al., "Combining modality specific deep neural networks for emotion recognition in video," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 543--550.

Digital Library

[24]

M. Gönen and E. Alpaydfin, "Multiple Kernel Learning Algorithms," The Journal of Machine Learning Research, vol. 12, pp. 2211--2268, 2011.

Digital Library

[25]

J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols, "Facial and vocal expressions of emotion," Annu Rev Psychol, vol. 54, pp. 329--349, 2003.

[26]

X. Zhang, M. H. Mahoor, and R. M. Voyles, "Facial expression recognition using HessianMKL based multiclass-SVM," in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, 2013, pp. 1--6.

[27]

X. Zhang, M. H. Mahoor, S. M. Mavadati, and J. F. Cohn, "A lp-norm MTMKL framework for simultaneous detection of multiple facial action units," in Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on 2014, pp. 1104--1111.

[28]

N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in Computer Vision and Pattern Recognition, 2005. IEEE Conference on, 2005, pp. 886--893.

Digital Library

[29]

F. Eyben, M. Wollmer, and B. Schuller, "OpenEAR -- Introducing the munich open-source emotion and affect recognition toolkit," in Affective Computing and Intelligent Interaction and Workshops. ACII 2009. 3rd International Conference on, 2009, pp. 1--6.

[30]

F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile_ the munich versatile and fast open-source," in Proceedings of the international conference on Multimedia, 2010, pp. 1459--1462.

Digital Library

[31]

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, et al., "The INTERSPEECH 2010 paralinguistic challenge," in INTERSPEECH, 2010, pp. 2794--2797.

[32]

B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic, "Avec 2011-the first international audio/visual emotion challenge," in Affective Computing and Intelligent Interaction, ed: Springer, 2011, pp. 415--424.

Digital Library

[33]

C. J. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data mining and knowledge discovery, vol. 2, pp. 121--167, 1998.

Digital Library

[34]

G. R. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, "Learning the Kernel Matrix with Semi-Definite Programming," The Journal of Machine Learning Research, vol. 5, pp. 27--72, 2004.

Digital Library

[35]

P. Viola and M. Jones, "Robust Real-Time Face Detection," International journal of computer vision, vol. 57, pp. 137--154, 2004.

Digital Library

[36]

A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon, "Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol," in ACM International Conference on Multimodal Interaction 2014., 2014.

Digital Library

Cited By

Nia ATang VMaso Talou GBillinghurst M(2025)Decoding emotions through personalized multi-modal fNIRS-EEG Systems: Exploring deterministic fusion techniquesBiomedical Signal Processing and Control10.1016/j.bspc.2025.107632105(107632)Online publication date: Jul-2025
https://doi.org/10.1016/j.bspc.2025.107632
Varshneya SLedent ALiznerski PBalinskyy AMehta PMustafa WKloft MLarson K(2024)Interpretable tensor fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/557(5037-5045)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/557
Wang LKang XDing FNakagawa SRen F(2024)MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression RecognitionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446699(3015-3019)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446699
Show More Cited By

Index Terms

Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Multiple kernel learning for emotion recognition in the wild
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

We propose a method to automatically detect emotions in unconstrained settings as part of the 2013 Emotion Recognition in the Wild Challenge [16], organized in conjunction with the ACM International Conference on Multimodal Interaction (ICMI 2013). Our ...
Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, ...
2D facial expression recognition via 3D reconstruction and feature fusion

This paper proposed a method for facial expression recognition.In proposed method, facial depth has been added to facial texture for feature extraction.We demonstrated that adding the facial depth to feature extraction is effective.The 3DH-LLBP is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

November 2014

558 pages

ISBN:9781450328852

DOI:10.1145/2663204

General Chairs:
Albert Ali Salah
Boğaziçi University, Turkey
,
Jeffrey Cohn
University of Pittsburgh, USA
,
Björn Schuller
University of Passau, Germany and Imperial College London, UK
,
Program Chairs:
Oya Aran
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Philip R. Cohen
Adapx, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hong Kong Polytechnic University

Conference

ICMI '14

Sponsor:

SIGCHI

ICMI '14: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12 - 16, 2014

Istanbul, Turkey

Acceptance Rates

ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
682
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nia ATang VMaso Talou GBillinghurst M(2025)Decoding emotions through personalized multi-modal fNIRS-EEG Systems: Exploring deterministic fusion techniquesBiomedical Signal Processing and Control10.1016/j.bspc.2025.107632105(107632)Online publication date: Jul-2025
https://doi.org/10.1016/j.bspc.2025.107632
Varshneya SLedent ALiznerski PBalinskyy AMehta PMustafa WKloft MLarson K(2024)Interpretable tensor fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/557(5037-5045)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/557
Wang LKang XDing FNakagawa SRen F(2024)MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression RecognitionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446699(3015-3019)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446699
Geethanjali RValarmathi A(2024)A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social mediaScientific Reports10.1038/s41598-024-73452-214:1Online publication date: 27-Sep-2024
https://doi.org/10.1038/s41598-024-73452-2
Huang ZZhu YLi HYang D(2024)Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attentionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108535133(108535)Online publication date: Jul-2024
https://doi.org/10.1016/j.engappai.2024.108535
Yan SWang YMai XZhao QSong WHuang JTao ZWang HGao SZhang W(2024)Empower smart cities with sampling-wise dynamic facial expression recognition via frame-sequence contrastive learningComputer Communications10.1016/j.comcom.2023.12.032216(130-139)Online publication date: Feb-2024
https://doi.org/10.1016/j.comcom.2023.12.032
Gong WQian YZhou WLeng H(2024)Enhanced spatial-temporal learning network for dynamic facial expression recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2023.10531688(105316)Online publication date: Feb-2024
https://doi.org/10.1016/j.bspc.2023.105316
Wang LKang XDing FNakagawa SRen F(2024)A joint local spatial and global temporal CNN-Transformer for dynamic facial expression recognitionApplied Soft Computing10.1016/j.asoc.2024.111680161(111680)Online publication date: Aug-2024
https://doi.org/10.1016/j.asoc.2024.111680
Wang FLiu ZLei JZou ZHan WXu JLi XFeng ZLiang R(2024)Dynamic-Static Graph Convolutional Network for Video-Based Facial Expression RecognitionMultiMedia Modeling10.1007/978-3-031-53308-2_4(42-55)Online publication date: 28-Jan-2024
https://doi.org/10.1007/978-3-031-53308-2_4
Zhu QZhou YYao YSun LShi FShao WZhang DShen D(2023)Semi-Supervised Multi-View Fusion for Identifying CAP and COVID-19 With Unlabeled CT ImagesIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.32249377:3(887-899)Online publication date: Jun-2023
https://doi.org/10.1109/TETCI.2022.3224937
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten