skip to main content
10.1145/3078971.3079030acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Frame-Transformer Emotion Classification Network

Published: 06 June 2017 Publication History

Abstract

Emotional content is a key ingredient in user-generated videos. However, due to the emotion sparsely expressed in the user-generated video, it is very difficult to analayze emotions in videos. In this paper, we propose a new architecture--Frame-Transformer Emotion Classification Network (FT-EC-net) to solve three highly correlated emotion analysis tasks: emotion recognition, emotion attribution and emotion-oriented summarization. We also contribute a new dataset for emotion attribution task by annotating the ground-truth labels of attribution segments. A comprehensive set of experiments on two datasets demonstrate the effectiveness of our framework.

References

[1]
Esra Acar, Frank Hopfgartner, and Sahin Albayrak. 2016. A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material. Multimedia Tools and Applications (2016), 1--29.
[2]
Damian Borth, Rongrong Ji, Tao Chen, Thomas M. Breuel, and Shih-Fu Chang. 2013. Large-scale Visual Sentiment Ontology and Detectors using Adjective Noun Pairs. In ACM MM.
[3]
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--970.
[4]
Tao Chen, Damian Borth, Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Net- works. CoRR (2014).
[5]
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceed- ings of the 18th ACM International Conference on Multimodal Interaction. ACM, 427--432.
[6]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025.
[7]
Yu-Gang Jiang, Baohan Xu, and Xiangyang Xue. 2014. Predicting Emotions in User-Generated Videos. In AAAI.
[8]
Brendan Jou, Subhabrata Bhattacharya, and Shih-Fu Chang. 2014. Predicting Viewer Perceived Emotions in Animated GIFs. In ACM MM.
[9]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[10]
Dimitrios Kotzias, Misha Denil, Phil Blunsom, and Nando de Freitas. 2014. Deep Multi-Instance Transfer Learning. CoRR (2014).
[11]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classi-fication with Deep Convolutional Neural Networks. In NIPS .
[12]
Jie-Ling Lai and Yang Yi. 2012. Key frame extraction based on visual attention model. Journal of Visual Communication and Image Representation 23, 1 (2012), 114--125.
[13]
X. Lu, P. Suryanarayan, R. B. Adams, J. Li, M. G. Newman, and J. Z Wang. 2012. On shape and the computability of emotions. In ACM MM .
[14]
Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A User Attention Model for Video Summarization. In ACM MM.
[15]
J. Machajdik and A. Hanbury. 2010. Affective image classication using features inspired by psychology and art theory. In ACM MM.
[16]
Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, and Andrew Gallagher. 2015. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions. In CVPR.
[17]
Attend Show. 2015. Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu et. al. arXiv Pre-Print (2015).
[18]
Krishna Kumar Singh and Yong Jae Lee. 2016. End-to-end localization and ranking for relative attributes. In European Conference on Computer Vision. Springer, 753--769.
[19]
Ba Tu Truong and Svetha Venkatesh. 2007. Video Abstraction: A Systematic Review and Classification. ACM TOMM 3, 1 (2007), 79--82.
[20]
Meng Wang, R. Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, and Tat-Seng Chua. 2012. Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. IEEE TMM 14, 4 (2012), 975--985.
[21]
S. Wang and Q. Ji. 2015. Video affective content analysis: a survey of state of the art methods. IEEE TAC PP, 99 (2015), 1--1.
[22]
Xi Wang, Yu-Gang Jiang, Zhenhua Chai, Zichen Gu, Xinyu Du, and Dong Wang. 2014. Real-time summarization of user-generated videos based on semantic recognition. In ACM MM.
[23]
Baohan Xu, Yanwei Fu, Yu gang Jiang, Boyang Li, and Leonid Sigal. 2016. Video Emotion Recognition with Transferred Deep Feature Encodings. In ICMR.
[24]
Baohan Xu, Yanwei Fu, Yu gang Jiang, Boyang Li, and Leonid Sigal. 2017. Het- erogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization. IEEE TAC (2017).
[25]
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks. In AAAI.

Cited By

View all
  • (2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
  • (2023)Joint multimodal sentiment analysis based on information relevanceInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10319360:2Online publication date: 1-Mar-2023
  • (2022)Across the Universe: Biasing Facial Representations Toward Non-Universal Emotions With the Face-STNIEEE Access10.1109/ACCESS.2022.321018310(103932-103947)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval
June 2017
524 pages
ISBN:9781450347013
DOI:10.1145/3078971
  • General Chairs:
  • Bogdan Ionescu,
  • Nicu Sebe,
  • Program Chairs:
  • Jiashi Feng,
  • Martha Larson,
  • Rainer Lienhart,
  • Cees Snoek
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. video emotion attribution
  2. video emotion recognition
  3. video emotion-oriented summarization and spatial-transformer network

Qualifiers

  • Research-article

Funding Sources

  • Shanghai Municipal Science and Technology Commission
  • Shanghai Sailing Program

Conference

ICMR '17
Sponsor:

Acceptance Rates

ICMR '17 Paper Acceptance Rate 33 of 95 submissions, 35%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
  • (2023)Joint multimodal sentiment analysis based on information relevanceInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10319360:2Online publication date: 1-Mar-2023
  • (2022)Across the Universe: Biasing Facial Representations Toward Non-Universal Emotions With the Face-STNIEEE Access10.1109/ACCESS.2022.321018310(103932-103947)Online publication date: 2022
  • (2021)Multimodal Emotion Recognition in Deep Learning:a Survey2021 International Conference on Culture-oriented Science & Technology (ICCST)10.1109/ICCST53801.2021.00027(77-82)Online publication date: Nov-2021
  • (2021)User-generated video emotion recognition based on key framesMultimedia Tools and Applications10.1007/s11042-020-10203-180:9(14343-14361)Online publication date: 1-Apr-2021
  • (2020)WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.293974422:5(1358-1371)Online publication date: May-2020
  • (2020)A Multi-Task Neural Approach for Emotion Attribution, Classification, and SummarizationIEEE Transactions on Multimedia10.1109/TMM.2019.292212922:1(148-159)Online publication date: 1-Jan-2020
  • (2020)A probability and integrated learning based classification algorithm for high-level human emotion recognition problemsMeasurement10.1016/j.measurement.2019.107049150(107049)Online publication date: Jan-2020
  • (2019)Deep Temporal–Spatial Aggregation for Video-Based Facial Expression RecognitionSymmetry10.3390/sym1101005211:1(52)Online publication date: 5-Jan-2019
  • (2019)Multiple Level Hierarchical Network-Based Clause Selection for Emotion Cause ExtractionIEEE Access10.1109/ACCESS.2018.28903907(9071-9079)Online publication date: 2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media