ABSTRACT
Current image emotion recognition works mainly classified the images into one dominant emotion category, or regressed the images with average dimension values by assuming that the emotions perceived among different viewers highly accord with each other. However, due to the influence of various personal and situational factors, such as culture background and social interactions, different viewers may react totally different from the emotional perspective to the same image. In this paper, we propose to formulate the image emotion recognition task as a probability distribution learning problem. Motivated by the fact that image emotions can be conveyed through different visual features, such as aesthetics and semantics, we present a novel framework by fusing multi-modal features to tackle this problem. In detail, weighted multi-modal conditional probability neural network (WMMCPNN) is designed as the learning model to associate the visual features with emotion probabilities. By jointly exploring the complementarity and learning the optimal combination coefficients of different modality features, WMMCPNN could effectively utilize the representation ability of each uni-modal feature. We conduct extensive experiments on three publicly available benchmarks and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for emotion distribution prediction.
- Xavier Alameda-Pineda, Elisa Ricci, Yan Yan, and Nicu Sebe. 2016. Recognizing emotions from abstract paintings using non-linear matrix completion IEEE Conference on Computer Vision and Pattern Recognition. 5240--5248.Google Scholar
- Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs ACM International Conference on Multimedia. 223--232. Google ScholarDigital Library
- Michael Carney, Pádraig Cunningham, Jim Dowling, and Ciaran Lee. 2005. Predicting probability distributions for surf height using an ensemble of mixture density networks. In International Conference on Machine Learning. 113--120. Google ScholarDigital Library
- Minghai Chen, Guiguang Ding, Sicheng Zhao, Hui Chen, Qiang Liu, and Jungong Han 2017. Reference Based LSTM for Image Captioning. In AAAI Conference on Artificial Intelligence. 3981--3987.Google Scholar
- Tao Chen, Felix X Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application ACM International Conference on Multimedia. 367--376. Google ScholarDigital Library
- Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion, Vol. 6, 3--4 (1992), 169--200.Google ScholarCross Ref
- Yue Gao, Sicheng Zhao, Yang Yang, and Tat-Seng Chua. 2015. Multimedia Social Event Detection in Microblog.. International Conference on Multimedia Modeling. 269--281.Google ScholarCross Ref
- Xin Geng. 2016. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748.Google ScholarCross Ref
- Xin Geng, Chao Yin, and Zhi-Hua Zhou. 2013. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. Google ScholarDigital Library
- Alex Pappachen James and Belur V Dasarathy. 2014. Medical image fusion: A survey of the state of the art. Information Fusion Vol. 19 (2014), 4--19. Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding ACM International Conference on Multimedia. 675--678. Google ScholarDigital Library
- Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine Vol. 28, 5 (2011), 94--115.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
- Haifeng Liu, Zheng Hu, Dian Zhou, and Hui Tian. 2013. Cumulative Probability Distribution Model for Evaluating User Behavior Prediction Algorithms IEEE International Conference on Social Computing. 385--390. Google ScholarDigital Library
- Xin Lu, Poonam Suryanarayan, Reginald B Adams Jr, Jia Li, Michelle G Newman, and James Z Wang. 2012. On shape and the computability of emotions. In ACM International Conference on Multimedia. 229--238. Google ScholarDigital Library
- Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory ACM International Conference on Multimedia. 83--92. Google ScholarDigital Library
- Joseph A Mikels, Barbara L Fredrickson, Gregory R Larkin, Casey M Lindberg, Sam J Maglio, and Patricia A Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behavior Research Methods Vol. 37, 4 (2005), 626--630.Google ScholarCross Ref
- Dharmendra S Modha and Yeshaiahu Fainman. 1994. A learning law for density estimation. IEEE Transactions on Neural Networks Vol. 5, 3 (1994), 519--523. Google ScholarDigital Library
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In International Conference on Machine Learning. 689--696. Google ScholarDigital Library
- Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes IEEE Conference on Computer Vision and Pattern Recognition. 2751--2758. Google ScholarDigital Library
- Kuan-Chuan Peng, Amir Sadovnik, Andrew Gallagher, and Tsuhan Chen. 2015. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions IEEE Conference on Computer Vision and Pattern Recognition. 860--868.Google Scholar
- Gordon Pipa, Sonja Grün, and Carl van Vreeswijk. 2013. Impact of Spike Train Autostructure on Probability Distribution of Joint Spike Events. Neural Computation, Vol. 25, 5 (2013), 1123--1163.Google ScholarDigital Library
- Martin Riedmiller and Heinrich Braun. 1993. A direct adaptive method for faster backpropagation learning: The RPROP algorithm IEEE International Conference on Neural Networks. 586--591.Google Scholar
- Harold Schlosberg. 1954. Three dimensions of emotion. Psychological Review, Vol. 61, 2 (1954), 81.Google ScholarCross Ref
- Ming Sun, Jufeng Yang, Kai Wang, and Hui Shen. 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction IEEE International Conference on Multimedia and Expo. 1--6.Google Scholar
- Alessandro Vinciarelli and Gelareh Mohammadi. 2014. A survey of personality computing. IEEE Transactions on Affective Computing Vol. 5, 3 (2014), 273--291.Google ScholarCross Ref
- Johannes Wagner, Elisabeth Andre, Florian Lingenfelser, and Jonghwa Kim. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Transactions on Affective Computing Vol. 2, 4 (2011), 206--218. Google ScholarDigital Library
- Jingwen Wang, Jianlong Fu, Yong Xu, and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks. In International Joint Conference on Artificial Intelligence. 626--630. Google ScholarDigital Library
- Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 19, 5 (2009), 733--746. Google ScholarDigital Library
- Yang Yang, Jia Jia, Shumei Zhang, Boya Wu, Qicong Chen, Juanzi Li, Chunxiao Xing, and Jie Tang. 2014. How Do Your Friends on Social Media Disclose Your Emotions? AAAI Conference on Artificial Intelligence. 306--312. Google ScholarDigital Library
- Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo Luo. 2016 a. Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks. In ACM International Conference on Multimedia. 1008--1017. Google ScholarDigital Library
- Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016 b. Building a large scale dataset for image emotion recognition: The fine print and the benchmark AAAI Conference on Artificial Intelligence. 308--314. Google ScholarDigital Library
- Min-Ling Zhang and Lei Wu. 2015. Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 1 (2015), 107--120.Google ScholarCross Ref
- Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017 a. Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion International Joint Conference on Artificial Intelligence.Google Scholar
- Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014 a. Exploring principles-of-art features for image emotion recognition ACM International Conference on Multimedia. 47--56. Google ScholarDigital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2016 a. Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing (2016).Google Scholar
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017 b. Continuous Probability Distribution Prediction of Image Emotions via Multi-Task Shared Sparse Regression. IEEE Transactions on Multimedia Vol. 19, 3 (2017), 632--645. Google ScholarDigital Library
- Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, Wenlong Xie, Xiaolei Jiang, and Tat-Seng Chua. 2016 b. Predicting personalized emotion perceptions of social images ACM International Conference on Multimedia. 1385--1394. Google ScholarDigital Library
- Sicheng Zhao, Hongxun Yao, Xiaolei Jiang, and Xiaoshuai Sun. 2015. Predicting discrete probability distribution of image emotions IEEE International Conference on Image Processing. 2459--2463.Google Scholar
- Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014 b. Affective image retrieval via multi-graph learning ACM International Conference on Multimedia. 1025--1028. Google ScholarDigital Library
- Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding Advances in Neural Information Processing Systems. 1601--1608. Google ScholarDigital Library
Index Terms
- Learning Visual Emotion Distributions via Multi-Modal Features Fusion
Recommendations
Predicting Personalized Emotion Perceptions of Social Images
MM '16: Proceedings of the 24th ACM international conference on MultimediaImages can convey rich semantics and induce various emotions to viewers. Most existing works on affective image analysis focused on predicting the dominant emotions for the majority of viewers. However, such dominant emotion is often insufficient in ...
Exploring Principles-of-Art Features For Image Emotion Recognition
MM '14: Proceedings of the 22nd ACM international conference on MultimediaEmotions can be evoked in humans by images. Most previous works on image emotion analysis mainly used the elements-of-art-based low-level visual features. However, these features are vulnerable and not invariant to the different arrangements of ...
Image Emotion Computing
MM '16: Proceedings of the 24th ACM international conference on MultimediaImages can convey rich semantics and induce strong emotions in viewers. My research aims to predict image emotions from different aspects with respect to two main challenges: affective gap and subjective evaluation. To bridge the affective gap, we ...
Comments