skip to main content
10.1145/2993148.2997631acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Multi-view common space learning for emotion recognition in the wild

Published: 31 October 2016 Publication History

Abstract

It is a very challenging task to recognize emotion in the wild. Recently, combining information from various views or modalities has attracted more attention. Cross modality features and features extracted by different methods are regarded as multi-view information of the sample. In this paper, we propose a method to analyse multi-view features of emotion samples and automatically recognize the expression as part of the fourth Emotion Recognition in the Wild Challenge (EmotiW 2016). In our method, we first extract multi-view features such as BoF, CNN, LBP-TOP and audio features for each expression sample. Then we learn the corresponding projection matrices to map multi-view features into a common subspace. In the meantime, we impose l_2,1-norm penalties on projection matrices for feature selection. We apply both this method and PLSR to emotion recognition. We conduct experiments on both AFEW and HAPPEI datasets, and achieve superior performance. The best recognition accuracy of our method is 0.5531 on the AFEW dataset for video based emotion recognition in the wild. The minimum RMSE for group happiness intensity recognition is 0.9525 on HAPPEI dataset. Both of them are much better than that of the challenge baseline.

References

[1]
A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
[2]
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the ACM International Conference on Multimodal Interfaces, pages 205–211, 2004.
[3]
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Proceedings of European Conference Computer Vision Workshop on Statistical Learning in Computer Vision, pages 1–2, 2004.
[4]
A. Dhall, R. Goecke, and T. Gedeon. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1):13–26, 2015.
[5]
A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2016.
[6]
A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 461–466, 2014.
[7]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Transactions on Multimedia, 19(3):34–41, 2012.
[8]
F. Eyben, F. Weninger, F. Gross, and B. Schuller. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 835–838, 2013.
[9]
F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the ACM International Conference on Multimedia (MM), pages 1459–1462, 2010.
[10]
X. Huang, A. Dhall, G. Zhao, R. Goecke, and M. Pietikäinen. Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In Proceedings of the British Machine Vision Conference (BMVC), pages 34.1–34.12, 2015.
[11]
Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1753–1760, 2011.
[12]
Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):493–506, 2014.
[13]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169–2178, 2006.
[14]
M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 494–501, 2014.
[15]
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
[16]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60(2):91–110, 2004.
[17]
B. Moore. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Transactions on Automatic Control, 26(1):17–32, 1981.
[18]
T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002.
[19]
O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, volume 1, page 6, 2015.
[20]
R. Rosipal and N. Krämer. Overview and recent advances in partial least squares. In Proceedings of the International Conference on Subspace, Latent Structure and Feature Selection, pages 34–51. Springer, 2006.
[21]
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. A. Müller, S. S. Narayanan, et al. The interspeech 2010 paralinguistic challenge. In InterSpeech, volume 2010, pages 2795–2798, 2010.
[22]
K. Sikka, K. Dykstra, S. Sathyanarayana, G. Littlewort, and M. Bartlett. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 517–524, 2013.
[23]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. 2015.
[24]
C. J. Ter Braak. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology, 67(5):1167–1179, 1986.
[25]
Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. Handbook of face recognition, pages 247–275, 2005.
[26]
M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic. Avec 2013: The continuous audio/visual emotion and depression recognition challenge. In Proceedings of the ACM International Workshop on Audio/Visual Emotion Challenge (AVEC), pages 3–10, 2013.
[27]
A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the ACM International Conference on Multimedia, pages 1469–1472, 2010.
[28]
H. Wang, F. Nie, and H. Huang. Multi-view clustering and feature learning via structured sparsity. In International Conference on Machine Learning, pages 352–360, 2013.
[29]
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3360–3367, 2010.
[30]
K. Wang, R. He, L. Wang, W. Wang, and T. Tan. Joint feature selection and subspace learning for cross-modal retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–1, 2015.
[31]
K. Wang, R. He, W. Wang, L. Wang, and T. Tan. Learning coupled feature spaces for cross-modal matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2088–2095, 2013.
[32]
H. Wold. Partial least squares. Encyclopedia of statistical sciences, pages 581–591, 1985.
[33]
J. Wu, Z. Lin, and H. Zha. Multiple models fusion for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 475–481, 2015.
[34]
Z. Wu, Y. Huang, L. Wang, and T. Tan. Group encoding of local features in image classification. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), pages 1505–1508, 2012.
[35]
X. Xiong and F. de la Torre. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 532–539, 2013.
[36]
J. Yang, K. Yu, Y. Gong, and H. Thomas. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1794–1801, 2009.
[37]
A. Yao, J. Shao, N. Ma, and Y. Chen. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pages 451–458, 2015.
[38]
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58, 2009.
[39]
G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915–928, 2007.
[40]
X. Zhu and D. Ramanan. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2879–2886, 2012.

Cited By

View all
  • (2024)A Rotation-Invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound ImagesComputer Vision – ECCV 202410.1007/978-3-031-72751-1_21(360-377)Online publication date: 26-Oct-2024
  • (2023)PerceptSent - Exploring Subjectivity in a Novel Dataset for Visual Sentiment AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322523814:3(1817-1831)Online publication date: 1-Jul-2023
  • (2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
October 2016
605 pages
ISBN:9781450345569
DOI:10.1145/2993148
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Common Space Learning
  2. EmotiW 2016 Challenge
  3. Emotion Recognition
  4. Multi-view Learning

Qualifiers

  • Short-paper

Funding Sources

  • Microsoft Research Asia Collaborative Research Program
  • National Basic Research Program of China (973 Program)
  • National Natural Science Foundation of China (NSFC)

Conference

ICMI '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Rotation-Invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound ImagesComputer Vision – ECCV 202410.1007/978-3-031-72751-1_21(360-377)Online publication date: 26-Oct-2024
  • (2023)PerceptSent - Exploring Subjectivity in a Novel Dataset for Visual Sentiment AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2022.322523814:3(1817-1831)Online publication date: 1-Jul-2023
  • (2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
  • (2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
  • (2022)Histogram Layers for Texture AnalysisIEEE Transactions on Artificial Intelligence10.1109/TAI.2021.31358043:4(541-552)Online publication date: Aug-2022
  • (2020)OutdoorSentACM Transactions on Information Systems10.1145/338518638:3(1-28)Online publication date: 21-Apr-2020
  • (2019)Feature-Level and Model-Level Audiovisual Fusion for Emotion Recognition in the Wild2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2019.00089(443-448)Online publication date: Mar-2019
  • (2018)Sentiment Analysis in Outdoor Images Using Deep LearningProceedings of the 24th Brazilian Symposium on Multimedia and the Web10.1145/3243082.3243093(181-188)Online publication date: 16-Oct-2018
  • (2018)Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545441(2833-2838)Online publication date: Aug-2018
  • (2018)Edge Convolutional Network for Facial Action Intensity Estimation2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)10.1109/FG.2018.00034(171-178)Online publication date: May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media