skip to main content
10.1145/2663204.2666277acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning

Published: 12 November 2014 Publication History

Abstract

This paper presents our proposed approach for the second Emotion Recognition in The Wild Challenge. We propose a new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG_TOP) to represent facial expressions. We also explore the properties of visual features and audio features, and adopt Multiple Kernel Learning (MKL) to find an optimal feature fusion. An SVM with multiple kernels is trained for the facial expression classification. Experimental results demonstrate that our method achieves a promising performance. The overall classification accuracy on the validation set and test set are 40.21% and 45.21%, respectively.

References

[1]
M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding Facial Expressions with Gabor Wavelets," in Automatic Face and Gesture Recognition, Proceedings. Third IEEE International Conference on, 1998, pp. 200--205.
[2]
T. Sim, S. Baker, and M. Bsat, "The Carnegie Mellon University pose, illumination, and expression database," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1615--1618, 2003.
[3]
M. Pantic, M. Valstar, R. Rademaker, and L. Maat, "Web-based database for facial expression analysis," in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, 2005.
[4]
R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, "Multi-pie," Image and Vision Computing, vol. 28, pp. 807--813, 2010.
[5]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "Static Facial Expression Analysis In Tough Conditions," in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 2011, pp. 2106--2112.
[6]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, "A semi-automatic method for collecting richly labelled large facial expression databases from movies," IEEE Multimedia, 2012.
[7]
A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon, "Emotion Recognition In The Wild Challenge 2013," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 509--516.
[8]
A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, "SimpleMKL," Journal of Machine Learning Research, vol. 9, pp. 2491--2521, 2008.
[9]
S. Z. Li and A. K. Jain, Handbook of face recognition: springer, 2011.
[10]
T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 681--685, 2001.
[11]
I. Matthews and S. Baker, "Active appearance models revisited," International Journal of Computer Vision, vol. 60, pp. 135--164, 2004.
[12]
D. Cristinacce and T. F. Cootes, "Feature detection and tracking with constrained local models," in BMVC, 2006, pp. 929--938.
[13]
X. Zhu and D. Ramanan, "Face detection, pose estimation and landmark localization in the wild," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 2879--2886.
[14]
H. G. Feichtinger and T. Strohmer, Gabor analysis and algorithms: Theory and applications: Springer, 1998.
[15]
T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971--987, 2002.
[16]
V. Ojansivu and J. Heikkilä, "Blur insensitive texture classification using local phase quantization," in Image and Signal Processing, 2008, pp. 236--243.
[17]
G. Zhao and M. Pietikainen, "Dynamic texture recognition using local binary patterns with an application to facial expressions," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, pp. 915--928, 2007.
[18]
J. Päivärinta, E. Rahtu, and J. Heikkilä, "Volume local phase quantization for blur-insensitive dynamic texture classification," in Proceedings of the 17th Scandinavian conference on Image analysis, 2011, pp. 360--369.
[19]
A. Dhall, A. Asthana, R. Goecke, and T. Gedeon, "Emotion recognition using PHOG and LPQ features," in Automatic Face & Gesture Recognition and Workshops IEEE International Conference on, 2011, pp. 878--883.
[20]
Y. Kim, H. Lee, and E. M. Provost, "Deep learning for robust feature generation in audiovisual emotion recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, 2013, pp. 3687--3691.
[21]
M. Liu, R. Wang, Z. Huang, S. Shan, and X. Chen, "Partial least squares regression on grassmannian manifold for emotion recognition," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 525--530.
[22]
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett, "Multiple kernel learning for emotion recognition in the wild," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 517--524.
[23]
S. E. Kanou, C. Pal, X. Bouthillier, P. Froumenty, Ç. Gülçehre, R. Memisevic, et al., "Combining modality specific deep neural networks for emotion recognition in video," in Proceedings of the 15th ACM on International conference on multimodal interaction, 2013, pp. 543--550.
[24]
M. Gönen and E. Alpaydfin, "Multiple Kernel Learning Algorithms," The Journal of Machine Learning Research, vol. 12, pp. 2211--2268, 2011.
[25]
J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols, "Facial and vocal expressions of emotion," Annu Rev Psychol, vol. 54, pp. 329--349, 2003.
[26]
X. Zhang, M. H. Mahoor, and R. M. Voyles, "Facial expression recognition using HessianMKL based multiclass-SVM," in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, 2013, pp. 1--6.
[27]
X. Zhang, M. H. Mahoor, S. M. Mavadati, and J. F. Cohn, "A lp-norm MTMKL framework for simultaneous detection of multiple facial action units," in Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on 2014, pp. 1104--1111.
[28]
N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in Computer Vision and Pattern Recognition, 2005. IEEE Conference on, 2005, pp. 886--893.
[29]
F. Eyben, M. Wollmer, and B. Schuller, "OpenEAR -- Introducing the munich open-source emotion and affect recognition toolkit," in Affective Computing and Intelligent Interaction and Workshops. ACII 2009. 3rd International Conference on, 2009, pp. 1--6.
[30]
F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile_ the munich versatile and fast open-source," in Proceedings of the international conference on Multimedia, 2010, pp. 1459--1462.
[31]
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, et al., "The INTERSPEECH 2010 paralinguistic challenge," in INTERSPEECH, 2010, pp. 2794--2797.
[32]
B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic, "Avec 2011-the first international audio/visual emotion challenge," in Affective Computing and Intelligent Interaction, ed: Springer, 2011, pp. 415--424.
[33]
C. J. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data mining and knowledge discovery, vol. 2, pp. 121--167, 1998.
[34]
G. R. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, "Learning the Kernel Matrix with Semi-Definite Programming," The Journal of Machine Learning Research, vol. 5, pp. 27--72, 2004.
[35]
P. Viola and M. Jones, "Robust Real-Time Face Detection," International journal of computer vision, vol. 57, pp. 137--154, 2004.
[36]
A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon, "Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol," in ACM International Conference on Multimodal Interaction 2014., 2014.

Cited By

View all
  • (2025)Decoding emotions through personalized multi-modal fNIRS-EEG Systems: Exploring deterministic fusion techniquesBiomedical Signal Processing and Control10.1016/j.bspc.2025.107632105(107632)Online publication date: Jul-2025
  • (2024)Interpretable tensor fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/557(5037-5045)Online publication date: 3-Aug-2024
  • (2024)MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression RecognitionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446699(3015-3019)Online publication date: 14-Apr-2024
  • Show More Cited By

Index Terms

  1. Emotion Recognition in the Wild with Feature Fusion and Multiple Kernel Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction
    November 2014
    558 pages
    ISBN:9781450328852
    DOI:10.1145/2663204
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. emotion recognition
    2. feature fusion
    3. hog_top
    4. multiple kernel learning
    5. support vector machine

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMI '14
    Sponsor:

    Acceptance Rates

    ICMI '14 Paper Acceptance Rate 51 of 127 submissions, 40%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Decoding emotions through personalized multi-modal fNIRS-EEG Systems: Exploring deterministic fusion techniquesBiomedical Signal Processing and Control10.1016/j.bspc.2025.107632105(107632)Online publication date: Jul-2025
    • (2024)Interpretable tensor fusionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/557(5037-5045)Online publication date: 3-Aug-2024
    • (2024)MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression RecognitionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446699(3015-3019)Online publication date: 14-Apr-2024
    • (2024)A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social mediaScientific Reports10.1038/s41598-024-73452-214:1Online publication date: 27-Sep-2024
    • (2024)Dynamic facial expression recognition based on spatial key-points optimized region feature fusion and temporal self-attentionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108535133(108535)Online publication date: Jul-2024
    • (2024)Empower smart cities with sampling-wise dynamic facial expression recognition via frame-sequence contrastive learningComputer Communications10.1016/j.comcom.2023.12.032216(130-139)Online publication date: Feb-2024
    • (2024)Enhanced spatial-temporal learning network for dynamic facial expression recognitionBiomedical Signal Processing and Control10.1016/j.bspc.2023.10531688(105316)Online publication date: Feb-2024
    • (2024)A joint local spatial and global temporal CNN-Transformer for dynamic facial expression recognitionApplied Soft Computing10.1016/j.asoc.2024.111680161(111680)Online publication date: Aug-2024
    • (2024)Dynamic-Static Graph Convolutional Network for Video-Based Facial Expression RecognitionMultiMedia Modeling10.1007/978-3-031-53308-2_4(42-55)Online publication date: 28-Jan-2024
    • (2023)Semi-Supervised Multi-View Fusion for Identifying CAP and COVID-19 With Unlabeled CT ImagesIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2022.32249377:3(887-899)Online publication date: Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media