research-article

Learning Visual Emotion Distributions via Multi-Modal Features Fusion

Authors:
Sicheng Zhao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Guiguang Ding

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yue Gao

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jungong Han

Lancaster University, Lancaster, United Kingdom

Lancaster University, Lancaster, United Kingdom
View Profile

MM '17: Proceedings of the 25th ACM international conference on MultimediaOctober 2017Pages 369–377https://doi.org/10.1145/3123266.3130858

Published:19 October 2017Publication History

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 369–377

ABSTRACT

Current image emotion recognition works mainly classified the images into one dominant emotion category, or regressed the images with average dimension values by assuming that the emotions perceived among different viewers highly accord with each other. However, due to the influence of various personal and situational factors, such as culture background and social interactions, different viewers may react totally different from the emotional perspective to the same image. In this paper, we propose to formulate the image emotion recognition task as a probability distribution learning problem. Motivated by the fact that image emotions can be conveyed through different visual features, such as aesthetics and semantics, we present a novel framework by fusing multi-modal features to tackle this problem. In detail, weighted multi-modal conditional probability neural network (WMMCPNN) is designed as the learning model to associate the visual features with emotion probabilities. By jointly exploring the complementarity and learning the optimal combination coefficients of different modality features, WMMCPNN could effectively utilize the representation ability of each uni-modal feature. We conduct extensive experiments on three publicly available benchmarks and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for emotion distribution prediction.

References

Xavier Alameda-Pineda, Elisa Ricci, Yan Yan, and Nicu Sebe. 2016. Recognizing emotions from abstract paintings using non-linear matrix completion IEEE Conference on Computer Vision and Pattern Recognition. 5240--5248.Google Scholar
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs ACM International Conference on Multimedia. 223--232. Google ScholarDigital Library
Michael Carney, Pádraig Cunningham, Jim Dowling, and Ciaran Lee. 2005. Predicting probability distributions for surf height using an ensemble of mixture density networks. In International Conference on Machine Learning. 113--120. Google ScholarDigital Library
Minghai Chen, Guiguang Ding, Sicheng Zhao, Hui Chen, Qiang Liu, and Jungong Han 2017. Reference Based LSTM for Image Captioning. In AAAI Conference on Artificial Intelligence. 3981--3987.Google Scholar
Tao Chen, Felix X Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application ACM International Conference on Multimedia. 367--376. Google ScholarDigital Library
Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion, Vol. 6, 3--4 (1992), 169--200.Google ScholarCross Ref
Yue Gao, Sicheng Zhao, Yang Yang, and Tat-Seng Chua. 2015. Multimedia Social Event Detection in Microblog.. International Conference on Multimedia Modeling. 269--281.Google ScholarCross Ref
Xin Geng. 2016. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748.Google ScholarCross Ref
Xin Geng, Chao Yin, and Zhi-Hua Zhou. 2013. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. Google ScholarDigital Library
Alex Pappachen James and Belur V Dasarathy. 2014. Medical image fusion: A survey of the state of the art. Information Fusion Vol. 19 (2014), 4--19. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding ACM International Conference on Multimedia. 675--678. Google ScholarDigital Library
Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine Vol. 28, 5 (2011), 94--115.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
Haifeng Liu, Zheng Hu, Dian Zhou, and Hui Tian. 2013. Cumulative Probability Distribution Model for Evaluating User Behavior Prediction Algorithms IEEE International Conference on Social Computing. 385--390. Google ScholarDigital Library
Xin Lu, Poonam Suryanarayan, Reginald B Adams Jr, Jia Li, Michelle G Newman, and James Z Wang. 2012. On shape and the computability of emotions. In ACM International Conference on Multimedia. 229--238. Google ScholarDigital Library
Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory ACM International Conference on Multimedia. 83--92. Google ScholarDigital Library
Joseph A Mikels, Barbara L Fredrickson, Gregory R Larkin, Casey M Lindberg, Sam J Maglio, and Patricia A Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behavior Research Methods Vol. 37, 4 (2005), 626--630.Google ScholarCross Ref
Dharmendra S Modha and Yeshaiahu Fainman. 1994. A learning law for density estimation. IEEE Transactions on Neural Networks Vol. 5, 3 (1994), 519--523. Google ScholarDigital Library
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In International Conference on Machine Learning. 689--696. Google ScholarDigital Library
Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes IEEE Conference on Computer Vision and Pattern Recognition. 2751--2758. Google ScholarDigital Library
Kuan-Chuan Peng, Amir Sadovnik, Andrew Gallagher, and Tsuhan Chen. 2015. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions IEEE Conference on Computer Vision and Pattern Recognition. 860--868.Google Scholar
Gordon Pipa, Sonja Grün, and Carl van Vreeswijk. 2013. Impact of Spike Train Autostructure on Probability Distribution of Joint Spike Events. Neural Computation, Vol. 25, 5 (2013), 1123--1163.Google ScholarDigital Library
Martin Riedmiller and Heinrich Braun. 1993. A direct adaptive method for faster backpropagation learning: The RPROP algorithm IEEE International Conference on Neural Networks. 586--591.Google Scholar
Harold Schlosberg. 1954. Three dimensions of emotion. Psychological Review, Vol. 61, 2 (1954), 81.Google ScholarCross Ref
Ming Sun, Jufeng Yang, Kai Wang, and Hui Shen. 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction IEEE International Conference on Multimedia and Expo. 1--6.Google Scholar
Alessandro Vinciarelli and Gelareh Mohammadi. 2014. A survey of personality computing. IEEE Transactions on Affective Computing Vol. 5, 3 (2014), 273--291.Google ScholarCross Ref
Johannes Wagner, Elisabeth Andre, Florian Lingenfelser, and Jonghwa Kim. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Transactions on Affective Computing Vol. 2, 4 (2011), 206--218. Google ScholarDigital Library
Jingwen Wang, Jianlong Fu, Yong Xu, and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks. In International Joint Conference on Artificial Intelligence. 626--630. Google ScholarDigital Library
Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 19, 5 (2009), 733--746. Google ScholarDigital Library
Yang Yang, Jia Jia, Shumei Zhang, Boya Wu, Qicong Chen, Juanzi Li, Chunxiao Xing, and Jie Tang. 2014. How Do Your Friends on Social Media Disclose Your Emotions? AAAI Conference on Artificial Intelligence. 306--312. Google ScholarDigital Library
Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo Luo. 2016 a. Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks. In ACM International Conference on Multimedia. 1008--1017. Google ScholarDigital Library
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016 b. Building a large scale dataset for image emotion recognition: The fine print and the benchmark AAAI Conference on Artificial Intelligence. 308--314. Google ScholarDigital Library
Min-Ling Zhang and Lei Wu. 2015. Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 1 (2015), 107--120.Google ScholarCross Ref
Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017 a. Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion International Joint Conference on Artificial Intelligence.Google Scholar
Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014 a. Exploring principles-of-art features for image emotion recognition ACM International Conference on Multimedia. 47--56. Google ScholarDigital Library
Sicheng Zhao, Hongxun Yao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2016 a. Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing (2016).Google Scholar
Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017 b. Continuous Probability Distribution Prediction of Image Emotions via Multi-Task Shared Sparse Regression. IEEE Transactions on Multimedia Vol. 19, 3 (2017), 632--645. Google ScholarDigital Library
Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, Wenlong Xie, Xiaolei Jiang, and Tat-Seng Chua. 2016 b. Predicting personalized emotion perceptions of social images ACM International Conference on Multimedia. 1385--1394. Google ScholarDigital Library
Sicheng Zhao, Hongxun Yao, Xiaolei Jiang, and Xiaoshuai Sun. 2015. Predicting discrete probability distribution of image emotions IEEE International Conference on Image Processing. 2459--2463.Google Scholar
Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014 b. Affective image retrieval via multi-graph learning ACM International Conference on Multimedia. 1025--1028. Google ScholarDigital Library
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding Advances in Neural Information Processing Systems. 1601--1608. Google ScholarDigital Library

Index Terms

Learning Visual Emotion Distributions via Multi-Modal Features Fusion
1. Applied computing
  1. Arts and humanities
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Predicting Personalized Emotion Perceptions of Social Images
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Images can convey rich semantics and induce various emotions to viewers. Most existing works on affective image analysis focused on predicting the dominant emotions for the majority of viewers. However, such dominant emotion is often insufficient in ...
Read More
Exploring Principles-of-Art Features For Image Emotion Recognition
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Emotions can be evoked in humans by images. Most previous works on image emotion analysis mainly used the elements-of-art-based low-level visual features. However, these features are vulnerable and not invariant to the different arrangements of ...
Read More
Image Emotion Computing
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Images can convey rich semantics and induce strong emotions in viewers. My research aims to predict image emotions from different aspects with respect to two main challenges: affective gap and subjective evaluation. To bridge the affective gap, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
discrete probability distribution
distribution learning
feature fusion
image emotion
multi-modal conditional probability neural network
Qualifiers
- research-article
Conference

Acceptance Rates
MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 581
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Visual Emotion Distributions via Multi-Modal Features Fusion

MM '17: Proceedings of the 25th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Predicting Personalized Emotion Perceptions of Social Images

Exploring Principles-of-Art Features For Image Emotion Recognition

Image Emotion Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media