skip to main content
10.1145/2911996.2912006acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Video Emotion Recognition with Transferred Deep Feature Encodings

Published: 06 June 2016 Publication History

Abstract

Despite growing research interest, emotion understanding for user-generated videos remains a challenging problem. Major obstacles include the diversity and complexity of video content, as well as the sparsity of expressed emotions. For the first time, we systematically study large-scale video emotion recognition by transferring deep feature encodings. In addition to the traditional, supervised recognition, we study the problem of zero-shot emotion recognition, where emotions in the test set are unseen during training. To cope with this task, we utilize knowledge transferred from auxiliary image and text corpora. A novel auxiliary Image Transfer Encoding (ITE) process is proposed to efficiently encode and generate video representation. We also thoroughly investigate different configurations of convolutional neural networks. Comprehensive experiments on multiple datasets demonstrate the effectiveness of our framework.

References

[1]
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In NIPS, 2003.
[2]
L. F. Barrett. Are emotions natural kinds? Perspectives on Psychological Science, 1(1):28--58, 2006.
[3]
D. Borth, R. Ji, T. Chen, T. M. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In ACM MM, 2013.
[4]
V. Campos, A. Salvador, X. Giro-i Nieto, and B. Jou. Diving deep into sentiment: Understanding fine-tuned cnns for visual sentiment prediction. In ACM ASM, 2015.
[5]
J. M. Carroll and J. A. Russell. Do facial expressions signal specific emotions? Judging emotion from the face in context. Journal of Personality and Social Psychology, 70(2):205--218, 1996.
[6]
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.
[7]
T. Chen, D. Borth, Darrell, and S.-F. Chang. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. CoRR, 2014.
[8]
Y. Chen, J. Bi, and J. Z. Wang. Miles: Multiple-instance learning via embedded instance selection. IEEE TPAMI, 28(1):1931--1947, 2006.
[9]
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing textures in the wild. In CVPR, 2014.
[10]
C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995.
[11]
P. Ekman. Universals and cultural differences in facial expressions of emotion. Nebrasak Symposium on Motivation, 19:207--284, 1972.
[12]
P. Ekman. An argument for basic emotions. Cognition & emotion, 6(3--4):169--200, 1992.
[13]
Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Learning multimodal latent attributes. IEEE TPAMI, 36(2):303--316, 2014.
[14]
Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-view zero-shot learning. IEEE TPAMI, 37(11):2332--2345, 2015.
[15]
T. Gärtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In ICML. Morgan Kaufmann, 2002.
[16]
A. Ghodrati, M. Pedersoli, and T. Tuytelaars. Is 2d information enough for viewpoint estimation? In BMVC, 2014.
[17]
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.
[18]
J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100--108, 1979.
[19]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. CoRR, 2014.
[20]
Y.-G. Jiang, B. Xu, and X. Xue. Predicting emotions in user-generated videos. In AAAI, 2014.
[21]
B. Jou, S. Bhattacharya, and S.-F. Chang. Predicting viewer perceived emotions in animated gifs. In ACM MM, 2014.
[22]
H.-B. Kang. Affective content detection using hmms. In ACM MM, 2003.
[23]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[24]
C. H. Lampert, H. Nickisch, and S. Harmeling. Attribute-based classification for zero-shot visual object categorization. IEEE TPAMI, 36(3):453--465, 2013.
[25]
K. A. Lindquist, T. D. Wager, H. Kober, E. Bliss-Moreau, and L. F. Barrett. The brain basis of emotion: a meta-analytic review. Trends in cognitive sciences, 35(3):121--143, 2012.
[26]
G. Liu, J. Wu, and Z. Zhou. Key instance detection in multi-instance learning. In ACML, 2012.
[27]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.
[28]
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2009.
[29]
S. Marsella and J. Gratch. EMA: A process model of appraisal dynamics. Journal of Cognitive Systems Research, 10(1):70--90, 2009.
[30]
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
[31]
Y. Movshovitz-Attias, Q. Yu, M. C. Stumpe, V. Shet, S. Arnoud, and L. Yatziv. Ontological supervision for fine grained classification of street view storefronts. In CVPR, 2015.
[32]
A. Ortony, G. Clore, and A. Collins. The Cognitive Structure of Emotions. Cambridge University Press, 1988.
[33]
R. Plutchik and H. Kellerman. Emotion: Theory, research and experience. Vol. 1, Theories of emotion. 1980.
[34]
P. R., editor. The emotions. University Press of America, 1991.
[35]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.
[36]
B. M. Sikka K, Dhall A. Weakly supervised pain localization using multiple instance learning. In IEEE FG, 2013.
[37]
S. T. R. Stein B E. Multisensory integration: current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9(4):255--266, 2008.
[38]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, 2014.
[39]
H.-L. Wang and L.-F. Cheong. Affective understanding in film. IEEE TCSVT, 16(6):689--704, 2006.
[40]
J. Wang and J.-D. Zucker. Solving the multiple-instance problem: A lazy learning approach. In ICML, 2000.
[41]
X.-S. Wei, J. Wu, and Z.-H. Zhou. Scalable multi-instance learning. In ICDM, 2014.
[42]
B. Xu, Y. Fu, Y.-G. Jiang, B. Li, and L. Sigal. Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. CoRR, 2015.
[43]
C. Xu, S. Cetintas, K.-C. Lee, and L.-J. Li. Visual sentiment prediction with deep convolutional neural networks. CoRR, 2014.
[44]
X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In PAKDD, 2004.
[45]
Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative CNN video representation for event detection. CoRR, 2014.
[46]
J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In NIPS, 2014.
[47]
Q. You, J. Luo, H. Jin, and J. Yang. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In AAAI, 2015.
[48]
Z.-H. Zhou and M.-L. Zhang. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 11(2):155--170, 2007.

Cited By

View all
  • (2024)A supervised contrastive learning-based model for image emotion classificationWorld Wide Web10.1007/s11280-024-01260-927:3Online publication date: 24-Apr-2024
  • (2023)Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313416725(881-891)Online publication date: 2023
  • (2023)Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318719810:5(2546-2556)Online publication date: Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
June 2016
452 pages
ISBN:9781450343596
DOI:10.1145/2911996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. video emotion
  3. zero-shot emotion recognition

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR'16
Sponsor:
ICMR'16: International Conference on Multimedia Retrieval
June 6 - 9, 2016
New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A supervised contrastive learning-based model for image emotion classificationWorld Wide Web10.1007/s11280-024-01260-927:3Online publication date: 24-Apr-2024
  • (2023)Recognition of Emotions in User-Generated Videos through Frame-Level Adaptation and Emotion Intensity LearningIEEE Transactions on Multimedia10.1109/TMM.2021.313416725(881-891)Online publication date: 2023
  • (2023)Emotion and Gesture Guided Action Recognition in Videos Using Supervised Deep NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.318719810:5(2546-2556)Online publication date: Oct-2023
  • (2023)Direct side information learning for zero-shot regressionNeurocomputing10.1016/j.neucom.2023.126873561(126873)Online publication date: Dec-2023
  • (2023)Efficient facial expression recognition framework based on edge computingThe Journal of Supercomputing10.1007/s11227-023-05548-x80:2(1935-1972)Online publication date: 24-Jul-2023
  • (2023)Innovations and Insights of Sequence-Based Emotion Detection in Human Face Through Deep LearningEmerging Trends in Expert Applications and Security10.1007/978-981-99-1909-3_33(385-395)Online publication date: 13-Jun-2023
  • (2023)Multimodal Cross-Attention Graph Network for Desire DetectionArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44216-2_42(512-523)Online publication date: 22-Sep-2023
  • (2022)Hyperspectral Imaging Zero-Shot Learning for Remote Marine Litter Detection and ClassificationRemote Sensing10.3390/rs1421551614:21(5516)Online publication date: 2-Nov-2022
  • (2022)Region Dual Attention-Based Video Emotion RecognitionComputational Intelligence and Neuroscience10.1155/2022/60963252022Online publication date: 1-Jan-2022
  • (2022)Sparse Spatial-Temporal Emotion Graph Convolutional Network for Video Emotion RecognitionComputational Intelligence and Neuroscience10.1155/2022/35188792022Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media