short-paper

Multi-view common space learning for emotion recognition in the wild

Authors:

Hongbin ZhaAuthors Info & Claims

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

Pages 464 - 471

https://doi.org/10.1145/2993148.2997631

Published: 31 October 2016 Publication History

Abstract

It is a very challenging task to recognize emotion in the wild. Recently, combining information from various views or modalities has attracted more attention. Cross modality features and features extracted by different methods are regarded as multi-view information of the sample. In this paper, we propose a method to analyse multi-view features of emotion samples and automatically recognize the expression as part of the fourth Emotion Recognition in the Wild Challenge (EmotiW 2016). In our method, we first extract multi-view features such as BoF, CNN, LBP-TOP and audio features for each expression sample. Then we learn the corresponding projection matrices to map multi-view features into a common subspace. In the meantime, we impose l_2,1-norm penalties on projection matrices for feature selection. We apply both this method and PLSR to emotion recognition. We conduct experiments on both AFEW and HAPPEI datasets, and achieve superior performance. The best recognition accuracy of our method is 0.5531 on the AFEW dataset for video based emotion recognition in the wild. The minimum RMSE for group happiness intensity recognition is 0.9525 on HAPPEI dataset. Both of them are much better than that of the challenge baseline.

References

[1]

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.

Digital Library

[2]

C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the ACM International Conference on Multimodal Interfaces, pages 205–211, 2004.

Digital Library

[3]

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Proceedings of European Conference Computer Vision Workshop on Statistical Learning in Computer Vision, pages 1–2, 2004.

[4]

A. Dhall, R. Goecke, and T. Gedeon. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1):13–26, 2015.

Digital Library

[5]

A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2016.

Digital Library

[6]

A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 461–466, 2014.

Digital Library

[7]

A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Transactions on Multimedia, 19(3):34–41, 2012.

Digital Library

[8]

F. Eyben, F. Weninger, F. Gross, and B. Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 835–838, 2013.

Digital Library

[9]

F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1459–1462, 2010.

Digital Library

[10]

X. Huang, A. Dhall, G. Zhao, R. Goecke, and M. Pietikäinen. Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In Proceedings of the British Machine Vision Conference (BMVC), pages 34.1–34.12, 2015.

[11]

Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1753–1760, 2011.

Digital Library

[12]

Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):493–506, 2014.

Digital Library

[13]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169–2178, 2006.

Digital Library

[14]

M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 494–501, 2014.

Digital Library

[15]

S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.

Digital Library

[16]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2):91–110, 2004.

Digital Library

[17]

B. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26(1):17–32, 1981.

[18]

T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002.

Digital Library

[19]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, volume 1, page 6, 2015.

[20]

R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Proceedings of the International Conference on Subspace, Latent Structure and Feature Selection, pages 34–51. Springer, 2006.

Digital Library

[21]

B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, S. S. Narayanan, et al. The interspeech 2010 paralinguistic challenge. In InterSpeech, volume 2010, pages 2795–2798, 2010.

[22]

K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 517–524, 2013.

Digital Library

[23]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. 2015.

[24]

C. J. Ter Braak. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology, 67(5):1167–1179, 1986.

[25]

Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. Handbook of face recognition, pages 247–275, 2005.

[26]

M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. Avec 2013: The continuous audio/visual emotion and depression recognition challenge. In Proceedings of the ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pages 3–10, 2013.

Digital Library

[27]

A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia, pages 1469–1472, 2010.

Digital Library

[28]

H. Wang, F. Nie, and H. Huang. Multi-view clustering and feature learning via structured sparsity. In International Conference on Machine Learning, pages 352–360, 2013.

[29]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360–3367, 2010.

[30]

K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–1, 2015.

[31]

K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2088–2095, 2013.

Digital Library

[32]

H. Wold. Partial least squares. Encyclopedia of statistical sciences, pages 581–591, 1985.

[33]

J. Wu, Z. Lin, and H. Zha. Multiple models fusion for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 475–481, 2015.

Digital Library

[34]

Z. Wu, Y. Huang, L. Wang, and T. Tan. Group encoding of local features in image classification. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pages 1505–1508, 2012.

[35]

X. Xiong and F. de la Torre. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 532–539, 2013.

Digital Library

[36]

J. Yang, K. Yu, Y. Gong, and H. Thomas. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1794–1801, 2009.

[37]

A. Yao, J. Shao, N. Ma, and Y. Chen. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 451–458, 2015.

Digital Library

[38]

Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58, 2009.

Digital Library

[39]

G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915–928, 2007.

Digital Library

[40]

X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879–2886, 2012.

Digital Library

Cited By

Liu TZhuang SNie JChen GGuo YZhou GCoatrieux JChen Y(2024)A Rotation-Invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound ImagesComputer Vision – ECCV 202410.1007/978-3-031-72751-1_21(360-377)Online publication date: 26-Oct-2024
https://doi.org/10.1007/978-3-031-72751-1_21
Lopes CMinetto RDelgado MSilva T(2023)PerceptSent - Exploring Subjectivity in a Novel Dataset for Visual Sentiment AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322523814:3(1817-1831)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3225238
Veltmeijer EGerritsen CHindriks K(2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TAFFC.2021.3065726
Show More Cited By

Index Terms

Multi-view common space learning for emotion recognition in the wild
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Activity recognition and understanding

Recommendations

EmotiW 2016: video and group-level emotion recognition challenges
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

This paper discusses the baseline for the Emotion Recognition in the Wild (EmotiW) 2016 challenge. Continuing on the theme of automatic affect recognition `in the wild', the EmotiW challenge 2016 consists of two sub-challenges: an audio-video based ...
Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...
Emotion recognition in the wild from videos using images
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

This paper presents the implementation details of the proposed solution to the Emotion Recognition in the Wild 2016 Challenge, in the category of video-based emotion recognition. The proposed approach takes the video stream from the audio-video trimmed ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

October 2016

605 pages

ISBN:9781450345569

DOI:10.1145/2993148

General Chairs:
Yukiko I. Nakano
Seikei University, Japan
,
Elisabeth André
Augsburg University, Germany
,
Toyoaki Nishida
Kyoto University, Japan
,
Program Chairs:
Louis-Philippe Morency
Carnegie Mellon University, USA
,
Carlos Busso
University of Texas at Dallas, USA
,
Catherine Pelachaud
ISIR, France / University of Paris6, France

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Microsoft Research Asia Collaborative Research Program
National Basic Research Program of China (973 Program)
National Natural Science Foundation of China (NSFC)

Conference

ICMI '16

Sponsor:

SIGCHI

ICMI '16: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12 - 16, 2016

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
257
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu TZhuang SNie JChen GGuo YZhou GCoatrieux JChen Y(2024)A Rotation-Invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound ImagesComputer Vision – ECCV 202410.1007/978-3-031-72751-1_21(360-377)Online publication date: 26-Oct-2024
https://doi.org/10.1007/978-3-031-72751-1_21
Lopes CMinetto RDelgado MSilva T(2023)PerceptSent - Exploring Subjectivity in a Novel Dataset for Visual Sentiment AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322523814:3(1817-1831)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3225238
Veltmeijer EGerritsen CHindriks K(2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TAFFC.2021.3065726
Rathod BVanzara RPandya D(2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103988
Peeples JXu WZare A(2022)Histogram Layers for Texture AnalysisIEEE Transactions on Artificial Intelligence10.1109/TAI.2021.31358043:4(541-552)Online publication date: Aug-2022
https://doi.org/10.1109/TAI.2021.3135804
Oliveira WDorini LMinetto RSilva T(2020)OutdoorSentACM Transactions on Information Systems10.1145/338518638:3(1-28)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3385186
Cai JMeng ZKhan ALi ZO'Reilly JHan SLiu PChen MTong Y(2019)Feature-Level and Model-Level Audiovisual Fusion for Emotion Recognition in the Wild2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2019.00089(443-448)Online publication date: Mar-2019
https://doi.org/10.1109/MIPR.2019.00089
Bonasoli WDorini LMinetto RSilva TNeto MNovais RFerraz CViana W(2018)Sentiment Analysis in Outdoor Images Using Deep LearningProceedings of the 24th Brazilian Symposium on Multimedia and the Web10.1145/3243082.3243093(181-188)Online publication date: 16-Oct-2018
https://dl.acm.org/doi/10.1145/3243082.3243093
Xu JDong YMa LBai H(2018)Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545441(2833-2838)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8545441
Li LBaltrusaitis TSun BMorency L(2018)Edge Convolutional Network for Facial Action Intensity Estimation2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)10.1109/FG.2018.00034(171-178)Online publication date: May-2018
https://doi.org/10.1109/FG.2018.00034
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten