skip to main content
10.1145/3123266.3130858acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Learning Visual Emotion Distributions via Multi-Modal Features Fusion

Published:19 October 2017Publication History

ABSTRACT

Current image emotion recognition works mainly classified the images into one dominant emotion category, or regressed the images with average dimension values by assuming that the emotions perceived among different viewers highly accord with each other. However, due to the influence of various personal and situational factors, such as culture background and social interactions, different viewers may react totally different from the emotional perspective to the same image. In this paper, we propose to formulate the image emotion recognition task as a probability distribution learning problem. Motivated by the fact that image emotions can be conveyed through different visual features, such as aesthetics and semantics, we present a novel framework by fusing multi-modal features to tackle this problem. In detail, weighted multi-modal conditional probability neural network (WMMCPNN) is designed as the learning model to associate the visual features with emotion probabilities. By jointly exploring the complementarity and learning the optimal combination coefficients of different modality features, WMMCPNN could effectively utilize the representation ability of each uni-modal feature. We conduct extensive experiments on three publicly available benchmarks and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for emotion distribution prediction.

References

  1. Xavier Alameda-Pineda, Elisa Ricci, Yan Yan, and Nicu Sebe. 2016. Recognizing emotions from abstract paintings using non-linear matrix completion IEEE Conference on Computer Vision and Pattern Recognition. 5240--5248.Google ScholarGoogle Scholar
  2. Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs ACM International Conference on Multimedia. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael Carney, Pádraig Cunningham, Jim Dowling, and Ciaran Lee. 2005. Predicting probability distributions for surf height using an ensemble of mixture density networks. In International Conference on Machine Learning. 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Minghai Chen, Guiguang Ding, Sicheng Zhao, Hui Chen, Qiang Liu, and Jungong Han 2017. Reference Based LSTM for Image Captioning. In AAAI Conference on Artificial Intelligence. 3981--3987.Google ScholarGoogle Scholar
  5. Tao Chen, Felix X Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application ACM International Conference on Multimedia. 367--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paul Ekman. 1992. An argument for basic emotions. Cognition & Emotion, Vol. 6, 3--4 (1992), 169--200.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yue Gao, Sicheng Zhao, Yang Yang, and Tat-Seng Chua. 2015. Multimedia Social Event Detection in Microblog.. International Conference on Multimedia Modeling. 269--281.Google ScholarGoogle ScholarCross RefCross Ref
  8. Xin Geng. 2016. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 7 (2016), 1734--1748.Google ScholarGoogle ScholarCross RefCross Ref
  9. Xin Geng, Chao Yin, and Zhi-Hua Zhou. 2013. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, 10 (2013), 2401--2412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alex Pappachen James and Belur V Dasarathy. 2014. Medical image fusion: A survey of the state of the art. Information Fusion Vol. 19 (2014), 4--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding ACM International Conference on Multimedia. 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and emotions in images. IEEE Signal Processing Magazine Vol. 28, 5 (2011), 94--115.Google ScholarGoogle ScholarCross RefCross Ref
  13. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Haifeng Liu, Zheng Hu, Dian Zhou, and Hui Tian. 2013. Cumulative Probability Distribution Model for Evaluating User Behavior Prediction Algorithms IEEE International Conference on Social Computing. 385--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xin Lu, Poonam Suryanarayan, Reginald B Adams Jr, Jia Li, Michelle G Newman, and James Z Wang. 2012. On shape and the computability of emotions. In ACM International Conference on Multimedia. 229--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory ACM International Conference on Multimedia. 83--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph A Mikels, Barbara L Fredrickson, Gregory R Larkin, Casey M Lindberg, Sam J Maglio, and Patricia A Reuter-Lorenz. 2005. Emotional category data on images from the International Affective Picture System. Behavior Research Methods Vol. 37, 4 (2005), 626--630.Google ScholarGoogle ScholarCross RefCross Ref
  18. Dharmendra S Modha and Yeshaiahu Fainman. 1994. A learning law for density estimation. IEEE Transactions on Neural Networks Vol. 5, 3 (1994), 519--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In International Conference on Machine Learning. 689--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes IEEE Conference on Computer Vision and Pattern Recognition. 2751--2758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kuan-Chuan Peng, Amir Sadovnik, Andrew Gallagher, and Tsuhan Chen. 2015. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions IEEE Conference on Computer Vision and Pattern Recognition. 860--868.Google ScholarGoogle Scholar
  22. Gordon Pipa, Sonja Grün, and Carl van Vreeswijk. 2013. Impact of Spike Train Autostructure on Probability Distribution of Joint Spike Events. Neural Computation, Vol. 25, 5 (2013), 1123--1163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin Riedmiller and Heinrich Braun. 1993. A direct adaptive method for faster backpropagation learning: The RPROP algorithm IEEE International Conference on Neural Networks. 586--591.Google ScholarGoogle Scholar
  24. Harold Schlosberg. 1954. Three dimensions of emotion. Psychological Review, Vol. 61, 2 (1954), 81.Google ScholarGoogle ScholarCross RefCross Ref
  25. Ming Sun, Jufeng Yang, Kai Wang, and Hui Shen. 2016. Discovering affective regions in deep convolutional neural networks for visual sentiment prediction IEEE International Conference on Multimedia and Expo. 1--6.Google ScholarGoogle Scholar
  26. Alessandro Vinciarelli and Gelareh Mohammadi. 2014. A survey of personality computing. IEEE Transactions on Affective Computing Vol. 5, 3 (2014), 273--291.Google ScholarGoogle ScholarCross RefCross Ref
  27. Johannes Wagner, Elisabeth Andre, Florian Lingenfelser, and Jonghwa Kim. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Transactions on Affective Computing Vol. 2, 4 (2011), 206--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jingwen Wang, Jianlong Fu, Yong Xu, and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks. In International Joint Conference on Artificial Intelligence. 626--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Meng Wang, Xian-Sheng Hua, Richang Hong, Jinhui Tang, Guo-Jun Qi, and Yan Song. 2009. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 19, 5 (2009), 733--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yang Yang, Jia Jia, Shumei Zhang, Boya Wu, Qicong Chen, Juanzi Li, Chunxiao Xing, and Jie Tang. 2014. How Do Your Friends on Social Media Disclose Your Emotions? AAAI Conference on Artificial Intelligence. 306--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo Luo. 2016 a. Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks. In ACM International Conference on Multimedia. 1008--1017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016 b. Building a large scale dataset for image emotion recognition: The fine print and the benchmark AAAI Conference on Artificial Intelligence. 308--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Min-Ling Zhang and Lei Wu. 2015. Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 1 (2015), 107--120.Google ScholarGoogle ScholarCross RefCross Ref
  34. Sicheng Zhao, Guiguang Ding, Yue Gao, and Jungong Han. 2017 a. Approximating Discrete Probability Distribution of Image Emotions by Multi-Modal Features Fusion International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  35. Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014 a. Exploring principles-of-art features for image emotion recognition ACM International Conference on Multimedia. 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sicheng Zhao, Hongxun Yao, Yue Gao, Guiguang Ding, and Tat-Seng Chua. 2016 a. Predicting personalized image emotion perceptions in social networks. IEEE Transactions on Affective Computing (2016).Google ScholarGoogle Scholar
  37. Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, and Guiguang Ding. 2017 b. Continuous Probability Distribution Prediction of Image Emotions via Multi-Task Shared Sparse Regression. IEEE Transactions on Multimedia Vol. 19, 3 (2017), 632--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sicheng Zhao, Hongxun Yao, Yue Gao, Rongrong Ji, Wenlong Xie, Xiaolei Jiang, and Tat-Seng Chua. 2016 b. Predicting personalized emotion perceptions of social images ACM International Conference on Multimedia. 1385--1394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sicheng Zhao, Hongxun Yao, Xiaolei Jiang, and Xiaoshuai Sun. 2015. Predicting discrete probability distribution of image emotions IEEE International Conference on Image Processing. 2459--2463.Google ScholarGoogle Scholar
  40. Sicheng Zhao, Hongxun Yao, You Yang, and Yanhao Zhang. 2014 b. Affective image retrieval via multi-graph learning ACM International Conference on Multimedia. 1025--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2006. Learning with hypergraphs: Clustering, classification, and embedding Advances in Neural Information Processing Systems. 1601--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning Visual Emotion Distributions via Multi-Modal Features Fusion

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader