Abstract
In recent times, pre-training models of a large scale have achieved notable success in various downstream tasks by relying on contrastive image-text pairs to learn high-quality visual general representations from natural language supervision. However, these models typically disregard sentiment knowledge during the pre-training phase, subsequently hindering their capacity for optimal image sentiment analysis. To address these challenges, we propose a sentiment-enriched continual training framework (SECT), which continually trains CLIP and introduces multi-level sentiment knowledge in the further pre-training process through the use of sentiment-based natural language supervision. Moreover, we construct a large-scale weakly annotated sentiment image-text dataset to ensure that the model is trained robustly. In addition, SECT conducts three training objectives that effectively integrate multi-level sentiment knowledge into the model training process. Our experiments on various datasets, namely EmotionROI, FI, and Twitter I, demonstrate that our SECT method provides a pre-training model that outperforms previous models and CLIP on most of the downstream datasets. Our codes will be publicly available for research purposes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010) (2010)
Barbieri, F., Camacho-Collados, J., Anke, L.E., Neves, L.: TweetEval: unified benchmark and comparative evaluation for tweet classification. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1644ā1650 (2020)
Changpinyo, S., Sharma, P., Ding, N., Soricut, R.: Conceptual 12m: pushing web-scale image-text pre-training to recognize long-tail visual concepts. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3557ā3567. IEEE (2021)
Deng, S., Wu, L., Shi, G., Xing, L., Jian, M.: Learning to compose diversified prompts for image emotion classification. arXiv preprint arXiv:2201.10963 (2022)
Deng, S., Wu, L., Shi, G., Zhang, H., Hu, W., Dong, R.: Emotion class-wise aware loss for image emotion classification. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds.) Artificial Intelligence. CICAI 2021. LNCS, vol. 13069, pp. 553ā564. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93046-2_47
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D.: Unicoder-vl: a universal encoder for vision and language by cross-modal pre-training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11336ā11344 (2020)
Li, M., et al.: Clip-event: connecting text and images with event structures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16420ā16429 (2022)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2018)
Peng, K.C., Sadovnik, A., Gallagher, A., Chen, T.: Where do emotions come from? predicting the emotion stimuli map. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 614ā618. IEEE (2016)
Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966 (2020)
Rao, T., Li, X., Zhang, H., Xu, M.: Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333, 429ā439 (2019)
She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: WSCNet: weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia 22(5), 1358ā1371 (2019)
Tian, H., et al.: Skep: sentiment knowledge enhanced pre-training for sentiment analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4067ā4076 (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wu, L., Zhang, H., Deng, S., Shi, G., Liu, X.: Discovering sentimental interaction via graph convolutional network for visual sentiment prediction. Appl. Sci. 11(4), 1404 (2021)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-ninth AAAI Conference on Artificial Intelligence (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Zhang, H., Xu, M.: Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Trans. Multimedia 23, 2033ā2044 (2020)
Zhao, S., et al.: Affective image content analysis: two decades review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 6729ā6751 (2021)
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grant NO. 62236010, 61976010, 62106011, 62106010, 62176011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, L., Xing, L., Shi, G., Deng, S., Yang, J. (2023). SECT: Sentiment-Enriched Continual Training for Image Sentiment Analysis. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14355. Springer, Cham. https://doi.org/10.1007/978-3-031-46305-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-46305-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46304-4
Online ISBN: 978-3-031-46305-1
eBook Packages: Computer ScienceComputer Science (R0)