research-article

Frame-Transformer Emotion Classification Network

Authors:

Xiangyang XueAuthors Info & Claims

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

Pages 78 - 83

https://doi.org/10.1145/3078971.3079030

Published: 06 June 2017 Publication History

Abstract

Emotional content is a key ingredient in user-generated videos. However, due to the emotion sparsely expressed in the user-generated video, it is very difficult to analayze emotions in videos. In this paper, we propose a new architecture--Frame-Transformer Emotion Classification Network (FT-EC-net) to solve three highly correlated emotion analysis tasks: emotion recognition, emotion attribution and emotion-oriented summarization. We also contribute a new dataset for emotion attribution task by annotating the ground-truth labels of attribution segments. A comprehensive set of experiments on two datasets demonstrate the effectiveness of our framework.

References

[1]

Esra Acar, Frank Hopfgartner, and Sahin Albayrak. 2016. A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material. Multimedia Tools and Applications (2016), 1--29.

Digital Library

[2]

Damian Borth, Rongrong Ji, Tao Chen, Thomas M. Breuel, and Shih-Fu Chang. 2013. Large-scale Visual Sentiment Ontology and Detectors using Adjective Noun Pairs. In ACM MM.

Digital Library

[3]

Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 961--970.

[4]

Tao Chen, Damian Borth, Darrell, and Shih-Fu Chang. 2014. DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Net- works. CoRR (2014).

[5]

Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. Emotiw 2016: Video and group-level emotion recognition challenges. In Proceed- ings of the 18th ACM International Conference on Multimodal Interaction. ACM, 427--432.

Digital Library

[6]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025.

Digital Library

[7]

Yu-Gang Jiang, Baohan Xu, and Xiangyang Xue. 2014. Predicting Emotions in User-Generated Videos. In AAAI.

Digital Library

[8]

Brendan Jou, Subhabrata Bhattacharya, and Shih-Fu Chang. 2014. Predicting Viewer Perceived Emotions in Animated GIFs. In ACM MM.

Digital Library

[9]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[10]

Dimitrios Kotzias, Misha Denil, Phil Blunsom, and Nando de Freitas. 2014. Deep Multi-Instance Transfer Learning. CoRR (2014).

[11]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classi-fication with Deep Convolutional Neural Networks. In NIPS .

Digital Library

[12]

Jie-Ling Lai and Yang Yi. 2012. Key frame extraction based on visual attention model. Journal of Visual Communication and Image Representation 23, 1 (2012), 114--125.

Digital Library

[13]

X. Lu, P. Suryanarayan, R. B. Adams, J. Li, M. G. Newman, and J. Z Wang. 2012. On shape and the computability of emotions. In ACM MM .

Digital Library

[14]

Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A User Attention Model for Video Summarization. In ACM MM.

Digital Library

[15]

J. Machajdik and A. Hanbury. 2010. Affective image classication using features inspired by psychology and art theory. In ACM MM.

Digital Library

[16]

Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, and Andrew Gallagher. 2015. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions. In CVPR.

[17]

Attend Show. 2015. Tell: Neural Image Caption Generation with Visual Attention. Kelvin Xu et. al. arXiv Pre-Print (2015).

[18]

Krishna Kumar Singh and Yong Jae Lee. 2016. End-to-end localization and ranking for relative attributes. In European Conference on Computer Vision. Springer, 753--769.

[19]

Ba Tu Truong and Svetha Venkatesh. 2007. Video Abstraction: A Systematic Review and Classification. ACM TOMM 3, 1 (2007), 79--82.

Digital Library

[20]

Meng Wang, R. Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, and Tat-Seng Chua. 2012. Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification. IEEE TMM 14, 4 (2012), 975--985.

Digital Library

[21]

S. Wang and Q. Ji. 2015. Video affective content analysis: a survey of state of the art methods. IEEE TAC PP, 99 (2015), 1--1.

[22]

Xi Wang, Yu-Gang Jiang, Zhenhua Chai, Zichen Gu, Xinyu Du, and Dong Wang. 2014. Real-time summarization of user-generated videos based on semantic recognition. In ACM MM.

Digital Library

[23]

Baohan Xu, Yanwei Fu, Yu gang Jiang, Boyang Li, and Leonid Sigal. 2016. Video Emotion Recognition with Transferred Deep Feature Encodings. In ICMR.

Digital Library

[24]

Baohan Xu, Yanwei Fu, Yu gang Jiang, Boyang Li, and Leonid Sigal. 2017. Het- erogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization. IEEE TAC (2017).

[25]

Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks. In AAAI.

Digital Library

Cited By

Zhang ZWang LYang J(2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01811
Chen DSu WWu PHua B(2023)Joint multimodal sentiment analysis based on information relevanceInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10319360:2Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.ipm.2022.103193
Barros PSciutti A(2022)Across the Universe: Biasing Facial Representations Toward Non-Universal Emotions With the Face-STNIEEE Access10.1109/ACCESS.2022.321018310(103932-103947)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3210183
Show More Cited By

Index Terms

Frame-Transformer Emotion Classification Network
1. Information systems
  1. Information retrieval
2. Mathematics of computing
  1. Probability and statistics

Recommendations

A Novel Active-Learning Based Emotion-Vision-Transformer Network for Expression Recognition
ISAIMS '23: Proceedings of the 2023 4th International Symposium on Artificial Intelligence for Medicine Science

The purpose of this research is to investigate variability in emotional expression that exists across different types of emotion, including different ages and gender of adolescents with Cleft Lip and Palate. Past studies suggest that adolescents with ...
Actual Emotion and False Emotion Classification by Physiological Signal
SIP '15: Proceedings of the 2015 8th International Conference on Signal Processing, Image Processing and Pattern Recognition (SIP)

The experiment had the purpose of analyzing actual emotions and false expressions through physiological signals. During the test, the subject imagined and expressed positive or negative emotion in accordance with the instructions of the experimenter. ...
Fuzzy Approach for Audio-Video Emotion Recognition in Computer Games for Children
Abstract
Computer games are widespread nowadays and enjoyed by people of all ages. But when it comes to kids, playing these games can be more than just fun—it's a way for them to develop important skills and build emotional intelligence. Facial expressions ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

June 2017

524 pages

ISBN:9781450347013

DOI:10.1145/3078971

General Chairs:
Bogdan Ionescu
University Politehnica of Bucharest, Romania
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Jiashi Feng
National University of Singapore, Singapore
,
Martha Larson
Radboud University & Delft University of Technology, The Netherlands
,
Rainer Lienhart
University of Augsburg, Germany
,
Cees Snoek
University of Amsterdam & Qualcomm Research Netherlands, The Netherlands

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Municipal Science and Technology Commission
Shanghai Sailing Program

Conference

ICMR '17

Sponsor:

SIGMM

ICMR '17: International Conference on Multimedia Retrieval

June 6 - 9, 2017

Bucharest, Romania

Acceptance Rates

ICMR '17 Paper Acceptance Rate 33 of 95 submissions, 35%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
377
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZWang LYang J(2023)Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.01811(18888-18897)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.01811
Chen DSu WWu PHua B(2023)Joint multimodal sentiment analysis based on information relevanceInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10319360:2Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.ipm.2022.103193
Barros PSciutti A(2022)Across the Universe: Biasing Facial Representations Toward Non-Universal Emotions With the Face-STNIEEE Access10.1109/ACCESS.2022.321018310(103932-103947)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3210183
Gu XShen YXu J(2021)Multimodal Emotion Recognition in Deep Learning:a Survey2021 International Conference on Culture-oriented Science & Technology (ICCST)10.1109/ICCST53801.2021.00027(77-82)Online publication date: Nov-2021
https://doi.org/10.1109/ICCST53801.2021.00027
Wei JYang XDong Y(2021)User-generated video emotion recognition based on key framesMultimedia Tools and Applications10.1007/s11042-020-10203-180:9(14343-14361)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s11042-020-10203-1
She DYang JCheng MLai YRosin PWang L(2020)WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.293974422:5(1358-1371)Online publication date: May-2020
https://doi.org/10.1109/TMM.2019.2939744
Tu GFu YLi BGao JJiang YXue X(2020)A Multi-Task Neural Approach for Emotion Attribution, Classification, and SummarizationIEEE Transactions on Multimedia10.1109/TMM.2019.292212922:1(148-159)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1109/TMM.2019.2922129
Jiang DWu KChen DTu GZhou TGarg AGao L(2020)A probability and integrated learning based classification algorithm for high-level human emotion recognition problemsMeasurement10.1016/j.measurement.2019.107049150(107049)Online publication date: Jan-2020
https://doi.org/10.1016/j.measurement.2019.107049
Pan XGuo WGuo XLi WXu JWu J(2019)Deep Temporal–Spatial Aggregation for Video-Based Facial Expression RecognitionSymmetry10.3390/sym1101005211:1(52)Online publication date: 5-Jan-2019
https://doi.org/10.3390/sym11010052
Yu XRong WZhang ZOuyang YXiong Z(2019)Multiple Level Hierarchical Network-Based Clause Selection for Emotion Cause ExtractionIEEE Access10.1109/ACCESS.2018.28903907(9071-9079)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2018.2890390
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten