skip to main content
10.1145/3581783.3612186acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Progressive Visual Content Understanding Network for Image Emotion Classification

Published: 27 October 2023 Publication History

Abstract

Most existing methods for image emotion classification extract features directly from images supervised by a single emotional label. However, this approach has a limitation known as the affective gap which restricts the capability of these features as they do not always align with the emotions perceived by users. To effectively bridge the affective gap, this paper proposes a visual content understanding network inspired by the human staged emotion perception process. The proposed network is comprised of three perception modules designed to extract multi-level information. Firstly, an entity perception module extracts entities from images. Secondly, an attribute perception module extracts the attribute content of each entity. Thirdly, an emotion perception module extracts emotion features based on both the entity and attribute information. We generate pseudo-labels of entities and attributes through image segmentation and vision-language models to provide auxiliary guidance for network learning. The progressive entity and attribute understanding enable the network to hierarchically extract semantic-level features for emotion analysis. Extensive experiments demonstrate that our progressive learning network achieves superior performance on various benchmark datasets for image emotion classification.

References

[1]
Moshe Bar. 2004. Visual objects in context. Nature Reviews Neuroscience, Vol. 5, 8 (2004), 617--629.
[2]
Damian Borth, Tao Chen, Rongrong Ji, and Shih-Fu Chang. 2013a. Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the 21st ACM international conference on Multimedia. 459--460.
[3]
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013b. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia. 223--232.
[4]
Tobias Brosch, Gilles Pourtois, and David Sander. 2010. The perception and categorisation of emotional stimuli: A review. Psychology Press.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.
[6]
Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586 (2014).
[7]
Bowen Cheng, Alex Schwing, and Alexander Kirillov. 2021. Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, Vol. 34 (2021), 17864--17875.
[8]
Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Wenjin Hu, Heng Zhang, and Ye Xiang. 2022b. Simple but Powerful, a Language-Supervised Method for Image Emotion Classification. IEEE Transactions on Affective Computing (2022).
[9]
Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, and Meng Jian. 2022a. Learning to compose diversified prompts for image emotion classification. arXiv preprint arXiv:2201.10963 (2022).
[10]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11]
Sharath Chandra Guntuku, Daniel Preotiuc-Pietro, Johannes C Eichstaedt, and Lyle H Ungar. 2019. What twitter profile and posted images reveal about depression and anxiety. In Proceedings of the international AAAI conference on web and social media, Vol. 13. 236--246.
[12]
Syed Zohaib Hassan, Kashif Ahmad, Ala Al-Fuqaha, and Nicola Conci. 2019. Sentiment analysis from images of natural disasters. In International conference on image analysis and processing. Springer, 104--113.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[14]
Sameer Hosany and Girish Prayag. 2013. Patterns of tourists' emotional responses, satisfaction, and intention to recommend. Journal of Business Research, Vol. 66, 6 (2013), 730--737.
[15]
Jitesh Jain, Jiachen Li, Mang Tik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. 2023. Oneformer: One transformer to rule universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2989--2998.
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.
[17]
Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Proceedings of the 18th ACM international conference on Multimedia. 83--92.
[18]
Andrew Ortony, Gerald L Clore, and Allan Collins. 1988. The cognitive structure of emotions. Cambridge university press.
[19]
Rameswar Panda, Jianming Zhang, Haoxiang Li, Joon-Young Lee, Xin Lu, and Amit K Roy-Chowdhury. 2018. Contemplating visual emotions: Understanding and overcoming dataset bias. In Proceedings of the European Conference on Computer Vision (ECCV). 579--595.
[20]
Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2751--2758.
[21]
Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik, and Andrew C Gallagher. 2015. A mixed bag of emotions: Model, predict, and transfer emotion distributions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 860--868.
[22]
Robert Plutchik. 1984. Emotions: A general psychoevolutionary theory. Approaches to emotion, Vol. 1984, 197--219 (1984), 2--4.
[23]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[24]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision, Vol. 115 (2015), 211--252.
[25]
Lifang Wu, Heng Zhang, Sinuo Deng, Ge Shi, and Xu Liu. 2021. Discovering sentimental interaction via graph convolutional network for visual sentiment prediction. Applied Sciences, Vol. 11, 4 (2021), 1404.
[26]
Baixi Xing, Kejun Zhang, Shouqian Sun, Lekai Zhang, Zenggui Gao, Jiaxi Wang, and Shi Chen. 2015. Emotion-driven Chinese folk music-image retrieval based on DE-SVM. Neurocomputing, Vol. 148 (2015), 619--627.
[27]
Can Xu, Suleyman Cetintas, Kuang-Chih Lee, and Li-Jia Li. 2014. Visual Sentiment Prediction with Deep Convolutional Neural Networks. arXiv preprint arXiv:1411.5731 (2014).
[28]
Liwen Xu, Zhengtao Wang, Bin Wu, and Simon Lui. 2022. MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9479--9488.
[29]
Jingyuan Yang, Xinbo Gao, Leida Li, Xiumei Wang, and Jinshan Ding. 2021. SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network. IEEE Transactions on Image Processing, Vol. 30 (2021), 8686--8701.
[30]
Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L Rosin, and Ming-Hsuan Yang. 2018. Weakly supervised coupled networks for visual sentiment analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7584--7592.
[31]
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Robust image sentiment analysis using progressively trained and domain transferred deep networks. In Proceedings of the AAAI conference on Artificial Intelligence, Vol. 29.
[32]
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
[33]
Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, and Lei Zhang. 2023. A simple framework for open-vocabulary segmentation and detection. arXiv preprint arXiv:2303.08131 (2023).
[34]
Haimin Zhang and Min Xu. 2020. Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Transactions on Multimedia, Vol. 23 (2020), 2033--2044.
[35]
Jing Zhang, Xinyu Liu, Mei Chen, Qi Ye, and Zhe Wang. 2022. Image sentiment classification via multi-level sentiment region correlation analysis. Neurocomputing, Vol. 469 (2022), 221--233.
[36]
Sicheng Zhao, Yue Gao, Xiaolei Jiang, Hongxun Yao, Tat-Seng Chua, and Xiaoshuai Sun. 2014. Exploring principles-of-art features for image emotion recognition. In Proceedings of the 22nd ACM international conference on Multimedia. 47--56.
[37]
Sicheng Zhao, Zizhou Jia, Hui Chen, Leida Li, Guiguang Ding, and Kurt Keutzer. 2019. Pdanet: Polarity-consistent deep attention network for fine-grained visual emotion regression. In Proceedings of the 27th ACM international conference on multimedia. 192--201.
[38]
Sicheng Zhao, Xingxu Yao, Jufeng Yang, Guoli Jia, Guiguang Ding, Tat-Seng Chua, Bjoern W Schuller, and Kurt Keutzer. 2021. Affective image content analysis: Two decades review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[39]
Xinge Zhu, Liang Li, Weigang Zhang, Tianrong Rao, Min Xu, Qingming Huang, and Dong Xu. 2017. Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition. In IJCAI. 3595--3601.

Cited By

View all
  • (2024)A Multi-Stage Visual Perception Approach for Image Emotion AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2024.337209015:3(1786-1799)Online publication date: 8-Mar-2024
  • (2024)Large Multimodal Models Thrive with Little Data for Image Emotion PredictionPattern Recognition10.1007/978-3-031-78107-0_19(298-313)Online publication date: 2-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. image emotion classification
  2. staged emotion perception
  3. visual content understanding

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)121
  • Downloads (Last 6 weeks)11
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Multi-Stage Visual Perception Approach for Image Emotion AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2024.337209015:3(1786-1799)Online publication date: 8-Mar-2024
  • (2024)Large Multimodal Models Thrive with Little Data for Image Emotion PredictionPattern Recognition10.1007/978-3-031-78107-0_19(298-313)Online publication date: 2-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media