research-article

Video Emotion Recognition with Transferred Deep Feature Encodings

Authors:

Leonid SigalAuthors Info & Claims

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 15 - 22

https://doi.org/10.1145/2911996.2912006

Published: 06 June 2016 Publication History

Abstract

Despite growing research interest, emotion understanding for user-generated videos remains a challenging problem. Major obstacles include the diversity and complexity of video content, as well as the sparsity of expressed emotions. For the first time, we systematically study large-scale video emotion recognition by transferring deep feature encodings. In addition to the traditional, supervised recognition, we study the problem of zero-shot emotion recognition, where emotions in the test set are unseen during training. To cope with this task, we utilize knowledge transferred from auxiliary image and text corpora. A novel auxiliary Image Transfer Encoding (ITE) process is proposed to efficiently encode and generate video representation. We also thoroughly investigate different configurations of convolutional neural networks. Comprehensive experiments on multiple datasets demonstrate the effectiveness of our framework.

References

[1]

S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In NIPS, 2003.

Digital Library

[2]

L. F. Barrett. Are emotions natural kinds? Perspectives on Psychological Science, 1(1):28--58, 2006.

[3]

D. Borth, R. Ji, T. Chen, T. M. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, 2013.

Digital Library

[4]

V. Campos, A. Salvador, X. Giro-i Nieto, and B. Jou. Diving deep into sentiment: Understanding fine-tuned cnns for visual sentiment prediction. In ACM ASM, 2015.

Digital Library

[5]

J. M. Carroll and J. A. Russell. Do facial expressions signal specific emotions? Judging emotion from the face in context. Journal of Personality and Social Psychology, 70(2):205--218, 1996.

[6]

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.

[7]

T. Chen, D. Borth, Darrell, and S.-F. Chang. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. CoRR, 2014.

[8]

Y. Chen, J. Bi, and J. Z. Wang. Miles: Multiple-instance learning via embedded instance selection. IEEE TPAMI, 28(1):1931--1947, 2006.

Digital Library

[9]

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In CVPR, 2014.

Digital Library

[10]

C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995.

[11]

P. Ekman. Universals and cultural differences in facial expressions of emotion. Nebrasak Symposium on Motivation, 19:207--284, 1972.

[12]

P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3--4):169--200, 1992.

[13]

Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Learning multimodal latent attributes. IEEE TPAMI, 36(2):303--316, 2014.

Digital Library

[14]

Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-view zero-shot learning. IEEE TPAMI, 37(11):2332--2345, 2015.

Digital Library

[15]

T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In ICML. Morgan Kaufmann, 2002.

[16]

A. Ghodrati, M. Pedersoli, and T. Tuytelaars. Is 2d information enough for viewpoint estimation? In BMVC, 2014.

[17]

B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.

[18]

J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100--108, 1979.

[19]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. CoRR, 2014.

Digital Library

[20]

Y.-G. Jiang, B. Xu, and X. Xue. Predicting emotions in user-generated videos. In AAAI, 2014.

Digital Library

[21]

B. Jou, S. Bhattacharya, and S.-F. Chang. Predicting viewer perceived emotions in animated gifs. In ACM MM, 2014.

Digital Library

[22]

H.-B. Kang. Affective content detection using hmms. In ACM MM, 2003.

Digital Library

[23]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.

Digital Library

[24]

C. H. Lampert, H. Nickisch, and S. Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE TPAMI, 36(3):453--465, 2013.

Digital Library

[25]

K. A. Lindquist, T. D. Wager, H. Kober, E. Bliss-Moreau, and L. F. Barrett. The brain basis of emotion: a meta-analytic review. Trends in cognitive sciences, 35(3):121--143, 2012.

[26]

G. Liu, J. Wu, and Z. Zhou. Key instance detection in multi-instance learning. In ACML, 2012.

[27]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.

Digital Library

[28]

C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2009.

Digital Library

[29]

S. Marsella and J. Gratch. EMA: A process model of appraisal dynamics. Journal of Cognitive Systems Research, 10(1):70--90, 2009.

Digital Library

[30]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.

Digital Library

[31]

Y. Movshovitz-Attias, Q. Yu, M. C. Stumpe, V. Shet, S. Arnoud, and L. Yatziv. Ontological supervision for fine grained classification of street view storefronts. In CVPR, 2015.

[32]

A. Ortony, G. Clore, and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988.

[33]

R. Plutchik and H. Kellerman. Emotion: Theory, research and experience. Vol. 1, Theories of emotion. 1980.

[34]

P. R., editor. The emotions. University Press of America, 1991.

[35]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.

[36]

B. M. Sikka K, Dhall A. Weakly supervised pain localization using multiple instance learning. In IEEE FG, 2013.

[37]

S. T. R. Stein B E. Multisensory integration: current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9(4):255--266, 2008.

[38]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, 2014.

[39]

H.-L. Wang and L.-F. Cheong. Affective understanding in film. IEEE TCSVT, 16(6):689--704, 2006.

Digital Library

[40]

J. Wang and J.-D. Zucker. Solving the multiple-instance problem: A lazy learning approach. In ICML, 2000.

Digital Library

[41]

X.-S. Wei, J. Wu, and Z.-H. Zhou. Scalable multi-instance learning. In ICDM, 2014.

Digital Library

[42]

B. Xu, Y. Fu, Y.-G. Jiang, B. Li, and L. Sigal. Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. CoRR, 2015.

[43]

C. Xu, S. Cetintas, K.-C. Lee, and L.-J. Li. Visual sentiment prediction with deep convolutional neural networks. CoRR, 2014.

[44]

X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In PAKDD, 2004.

[45]

Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative CNN video representation for event detection. CoRR, 2014.

[46]

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, 2014.

Digital Library

[47]

Q. You, J. Luo, H. Jin, and J. Yang. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In AAAI, 2015.

Digital Library

[48]

Z.-H. Zhou and M.-L. Zhang. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 11(2):155--170, 2007.

Cited By

Sun JZhang QYuan KJiang YChen X(2024)A supervised contrastive learning-based model for image emotion classificationWorld Wide Web10.1007/s11280-024-01260-927:3Online publication date: 24-Apr-2024
https://doi.org/10.1007/s11280-024-01260-9
Zhang HXu M(2023)Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313416725(881-891)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3134167
Nigam NDutta T(2023)Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318719810:5(2546-2556)Online publication date: Oct-2023
https://doi.org/10.1109/TCSS.2022.3187198
Show More Cited By

Index Terms

Video Emotion Recognition with Transferred Deep Feature Encodings
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Kernel methods
        Support vector machines

Recommendations

Multi-Feature Based Emotion Recognition for Video Clips
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

In this paper, we present our latest progress in Emotion Recognition techniques, which combines acoustic features and facial features in both non-temporal and temporal mode. This paper presents the details of our techniques used in the Audio-Video ...
Group emotion recognition with individual facial emotion CNNs and global image based CNNs
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

This paper presents our approach for group-level emotion recognition in the Emotion Recognition in the Wild Challenge 2017. The task is to classify an image into one of the group emotion such as positive, neutral or negative. Our approach is based on ...
Student's Emotion Recognition using Multimodality and Deep Learning
The goal of emotion detection is to find and recognise emotions in text, speech, gestures, facial expressions, and more. This paper proposes an effective multimodal emotion recognition system based on facial expressions, sentence-level text, and voice. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

June 2016

452 pages

ISBN:9781450343596

DOI:10.1145/2911996

General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National 863 Program
National Natural Science Foundation of China

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6 - 9, 2016

New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
499
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun JZhang QYuan KJiang YChen X(2024)A supervised contrastive learning-based model for image emotion classificationWorld Wide Web10.1007/s11280-024-01260-927:3Online publication date: 24-Apr-2024
https://doi.org/10.1007/s11280-024-01260-9
Zhang HXu M(2023)Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313416725(881-891)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3134167
Nigam NDutta T(2023)Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318719810:5(2546-2556)Online publication date: Oct-2023
https://doi.org/10.1109/TCSS.2022.3187198
Fdez-Díaz MMontañés EQuevedo J(2023)Direct side information learning for zero-shot regressionNeurocomputing10.1016/j.neucom.2023.126873561(126873)Online publication date: Dec-2023
https://doi.org/10.1016/j.neucom.2023.126873
Aikyn NZhanegizov AAidarov TBui DTu N(2023)Efficient facial expression recognition framework based on edge computingThe Journal of Supercomputing10.1007/s11227-023-05548-x80:2(1935-1972)Online publication date: 24-Jul-2023
https://doi.org/10.1007/s11227-023-05548-x
Kant KShah D(2023)Innovations and Insights of Sequence-Based Emotion Detection in Human Face Through Deep LearningEmerging Trends in Expert Applications and Security10.1007/978-981-99-1909-3_33(385-395)Online publication date: 13-Jun-2023
https://doi.org/10.1007/978-981-99-1909-3_33
Gu RWang XYang Q(2023)Multimodal Cross-Attention Graph Network for Desire DetectionArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44216-2_42(512-523)Online publication date: 22-Sep-2023
https://doi.org/10.1007/978-3-031-44216-2_42
Freitas SSilva HSilva E(2022)Hyperspectral Imaging Zero-Shot Learning for Remote Marine Litter Detection and ClassificationRemote Sensing10.3390/rs1421551614:21(5516)Online publication date: 2-Nov-2022
https://doi.org/10.3390/rs14215516
Liu XXu Hwang M(2022)Region Dual Attention-Based Video Emotion RecognitionComputational Intelligence and Neuroscience10.1155/2022/60963252022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/6096325
Liu XXu HWang M(2022)Sparse Spatial-Temporal Emotion Graph Convolutional Network for Video Emotion RecognitionComputational Intelligence and Neuroscience10.1155/2022/35188792022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/3518879
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten