Abstract
Image sentiment analysis aims to employ a computational model to automatically discover the implied emotions from the underlying image, which are crucial in many practical applications. The psychological finding demonstrates that the emotional content is ordinarily involved in some informative regions. In fact, these informative regions of visual images also convey different emotional polarities and intensities. The emotion conveyed by the whole image can be regard as the combined effect of the positive-polarity emotional regions and the negative-polarity emotional regions. Motivated by this psychological prior knowledge, we propose a new polarity-aware attention network for image sentiment analysis in an end-to-end manner. Specifically, the proposed network is composed of a sentimental feature extraction backbone, a polarity-aware attention module and a fused classification module. The backbone is used to extract the global contextual features. The polarity-aware attention module not only attends to positive and negative emotion regions by predicting polarity-aware attention maps, but also estimates their polarity intensities. The fused classification module integrates the output of the first two modules for the final image sentiment prediction. A compound loss function is designed to guide network learning using the weakly-supervised manner and the distributed label smooth learning method. We validate our method on multiple benchmarks and the experimental results demonstrate that our method can obtain superior performance over several state-of-the-art methods.
Similar content being viewed by others
References
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C., ChengXiang, Z. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Lang, P.J.: A bio-informational theory of emotional imagery. Psychophysiology 16(6), 495–512 (1979). https://doi.org/10.1111/j.1469-8986.1979.tb01511.x
Zhao, S., Ding, G., Huang, Q., Chua, T.-S., Schuller, B.W., Keutzer, K.: Affective image content analysis: a comprehensive survey. In: IJCAI, pp. 5534–5541. Morgan Kaufmann, Burlington (2018)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis A., Bischof H.P.A. (eds.) European Conference on Computer Vision, pp. 288–301. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11744078_23
Wei-Ning, W., Ying-Lin, Y., Sheng-Ming, J.: Image retrieval by emotional semantics: a study of emotional space and feature extraction. In: 2006 IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3534–3539 (2006). https://doi.org/10.1109/ICSMC.2006.384667
Yang, J., She, D., Lai, Y.-K., Rosin, P.L., Yang, M.-H.: Weakly supervised coupled networks for visual sentiment analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7584–7592. IEEE (2018)
Colombo, C., Del Bimbo, A., Pala, P.: Semantics in visual information retrieval. IEEE Multimed. 6(3), 38–53 (1999). https://doi.org/10.1109/93.790610
Stottinger, J., Banova, J., Ponitz, T., Sebe, N., Hanbury, A.: Translating journalists’ requirements into features for image search. In: 2009 15th International Conference on Virtual Systems and Multimedia, pp. 149–153. IEEE (2009). https://doi.org/10.1109/VSMM.2009.28
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov. 8(4), 1253 (2018). https://doi.org/10.1002/widm.1253
Do, H.H., Prasad, P., Maag, A., Alsadoon, A.: Deep learning for aspect-based sentiment analysis: a comparative review. Expert Syst. Appl. 118, 272–299 (2019)
Ortis, A., Farinella, G.M., Torrisi, G., Battiato, S.: Visual sentiment analysis based on on objective text description of images. In: 2018 International Conference on Content-based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2018). https://doi.org/10.1109/CBMI.2018.8516481
Zhang, H., Xu, M.: Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Trans. Multimed. 23, 2033–2044 (2020). https://doi.org/10.1109/TMM.2020.3007352
Yadav, A., Vishwakarma, D.K.: A deep learning architecture of RA-DLNET for visual sentiment analysis. Multimed. Syst. 26(4), 431–451 (2020). https://doi.org/10.1007/s00530-020-00656-7
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 83–92. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1873951.1873965
Kumari, K., Singh, J.P.: Multi-modal cyber-aggression detection with feature optimization by firefly algorithm. Multimed. Syst. (2021). https://doi.org/10.1007/s00530-021-00785-7
Xu, C., Cetintas, S., Lee, K., Li, L.: Visual sentiment prediction with deep convolutional neural networks. arXiv preprint arXiv:1411.5731 (2014)
Peng, K.-C., Chen, T., Sadovnik, A., Gallagher, A.C.: A mixed bag of emotions: model, predict, and transfer emotion distributions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–868. IEEE (2015)
Fan, S., Shen, Z., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7521–7531. IEEE (2018)
Geng, X.: Label distribution learning. IEEE Trans. Knowl. Data Eng. 28(7), 1734–1748 (2016). https://doi.org/10.1109/TKDE.2016.2545658
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-ninth AAAI Conference on Artificial Intelligence, pp. 381–388. AAAI (2015)
Ortis, A., Farinella, G.M., Battiato, S.: An overview on image sentiment analysis: methods, datasets and current challenges. In: Proceedings of the 16th International Joint Conference on e-Business and Telecommunications-SIGMAP, pp. 290–300. SciTePress (2019). https://doi.org/10.5220/0007909602900300
Wyer, R.S., Jr., Srull, T.K.: Perspectives on Anger and Emotion: Advances in Social Cognition, vol. Vi. Psychology Press, New York (2014). https://doi.org/10.4324/9781315806754
Osgood, C.E.: The nature and measurement of meaning. Psychol. Bull. 49(3), 197 (1952). https://doi.org/10.1037/h0055737
Russell, J.A., Mehrabian, A.: Evidence for a three-factor theory of emotions. J. Res. Pers. 11(3), 273–294 (1977). https://doi.org/10.1016/0092-6566(77)90037-X
Ekman, P., Friesen, W.V., O’sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., Krause, R., LeCompte, W.A., Pitcairn, T., Ricci-Bitti, P.E.: Universals and cultural differences in the judgments of facial expressions of emotion. J. Pers. Soc. Psychol. 53(4), 712 (1987). https://doi.org/10.1037/0022-3514.53.4.712
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005). https://doi.org/10.3758/BF03192732
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 223–232. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2502081.2502282
Siersdorfer, S., Minack, E., Deng, F., Hare, J.: Analyzing and predicting sentiment of images on the social web. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 715–718. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1873951.1874060
Zhu, X., Cao, B., Xu, S., Liu, B., Cao, J.: Joint visual-textual sentiment analysis based on cross-modality attention mechanism. In: International Conference on Multimedia Modeling, pp. 264–276. Springer (2019). https://doi.org/10.1007/978-3-030-05710-7_22
Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.-S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 47–56. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2647868.2654930
Borth, D., Chen, T., Ji, R., Chang, S.-F.: Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 459–460. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2502081.2502268
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016)
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: IJCAI, pp. 3266–3272. Morgan Kaufmann, Burlington (2017)
Yang, J., She, D., Sun, M., Cheng, M.-M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimed. 20(9), 2513–2525 (2018). https://doi.org/10.1109/TMM.2018.2803520
Song, K., Yao, T., Ling, Q., Mei, T.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018). https://doi.org/10.1016/j.neucom.2018.05.104
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/TIP.2021.3093397
Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 217–225. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3123266.3123445
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. Springer (2018)
Xia, X., Yang, L., Wei, X., Sahli, H., Jiang, D.: A multi-scale multi-attention network for dynamic facial expression recognition. Multimed. Syst. (2021). https://doi.org/10.1007/s00530-021-00849-8
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30. AAAI (2016)
Peng, K.-C., Sadovnik, A., Gallagher, A., Chen, T.: Where do emotions come from predicting the emotion stimuli map. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 614–618. IEEE (2016). https://doi.org/10.1109/ICIP.2016.7532430
Zhu, X., Li, L., Zhang, W., Rao, T., Xu, M., Huang, Q., Xu, D.: Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition. In: IJCAI, pp. 3595–3601. Morgan Kaufmann, Burlington (2017)
Rao, T., Li, X., Zhang, H., Xu, M.: Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333, 429–439 (2019). https://doi.org/10.1016/j.neucom.2018.12.053
Wu, L., Qi, M., Jian, M., Zhang, H.: Visual sentiment analysis by combining global and local information. Neural Process. Lett. 51(3), 2063–2075 (2020). https://doi.org/10.1007/s11063-019-10027-7
Acknowledgements
This work is partially supported by National Natural Science Foundation of China under Grant (62276139, U2001211, 61802199), and partially supported by Basic Research Program of Jiangsu Province (Frontier Leading Technology) under Grant BK20192004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by J. Gao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yan, Q., Sun, Y., Fan, S. et al. Polarity-aware attention network for image sentiment analysis. Multimedia Systems 29, 389–399 (2023). https://doi.org/10.1007/s00530-022-00935-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00935-5