Abstract
Images are powerful tools for affective content analysis. Image emotion recognition is useful for graphics, gaming, animation, entertainment, and cinematography. In this paper, a technique for recognizing the emotions in images containing facial, non-facial, and non-human components has been proposed. The emotion-labeled images are mapped to their corresponding textual captions. Then the captions are used to re-train a text emotion recognition model as the domain-adaptation approach. The adapted text emotion recognition model has been used to classify the captions into discrete emotion classes. As image captions have a one-to-one mapping with the images, the emotion labels predicted for the captions have been considered the emotion labels of the images. The suitability of using the image captions for emotion classification has been evaluated using caption-evaluation metrics. The proposed approach serves as an example to address the unavailability of sufficient emotion-labeled image datasets and pre-trained models. It has demonstrated an accuracy of 59.17% for image emotion recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Joshi, D., et al.: Aesthetics and emotions in images. IEEE Signal Process. Mag. 28(5), 94–115 (2011)
Kim, H.-R., Kim, Y.-S., Kim, S.J., Lee, I.-K.: Building emotional machines: recognizing image emotions through deep neural networks. IEEE Trans. Multimed. 20(11), 2980–2992 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
Machajdik, J., Hanbury., A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM International Conference on Multimedia (MM), pp. 83–92 (2010)
Rao, T., Li, X., Xu, M.: Learning multi-level deep representations for image emotion classification. Neural Process. Lett., 1–19 (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1097–1105 (2012)
Hanjalic, A.: Extracting moods from pictures and sounds: towards personalized TV. IEEE Signal Process. Mag. 23(2), 90–100 (2006)
Zhao, S., Gao, Y., et al.: Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM International Conference on Multimedia (MM), pp. 47–56 (2014)
Zhao, S., Yao, H., Gao, Y., Ji, R., Ding, G.: Continuous probability distribution of image emotions via multitask shared sparse regression. IEEE Trans. Multimedia 19(3), 632–645 (2016)
Zhao, S., Ding, G., et al.: Discrete probability distribution prediction of image emotions with shared sparse learning. IEEE Trans. Affective Comput. (2018)
Fernandez, P.D.M., Peña, F.A.G., Ren, T.I., Cunha, A.: FERAtt: facial expression recognition with attention net. arXiv preprint arXiv:1902.03284 (2019)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller., B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5688–5691 (2011)
Sahoo, S., Kumar, P., Raman, B., Roy, P.P.: A segment level approach to speech emotion recognition using transfer learning. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W.Q. (eds.) ACPR 2019. LNCS, vol. 12047, pp. 435–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41299-9_34
Poria, S., Cambria, E., Howard, N., Huang, G.-B., Hussain, A.: Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Elsevier Neurocomputing 174, 50–59 (2016)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Xu, K.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (ICML), pp. 2048–2057 (2015)
Sosa, P.M.: Twitter sentiment analysis using combined LSTM-CNN models. ACADEMIA, CS291, University of California, Santa Barbara (2017)
You, Q., Luo, J., Jin, H., Yang, J.: Building a large-scale dataset for image emotion recognition: the fine print and benchmark. In: Conference on Association for the Advancement of AI (AAAI) (2016)
Zhou, L., Palangi, H., Zhang, L., Hu, H., Corso, J.J., Gao, J.: Unified vision-language pre-training for image captioning and VQA. In: Conference on Association for the Advancement of AI (AAAI) (2020)
Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: IEEE International Conference on Image Processing (ICIP), pp. 634–638 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu., W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the Association for Computational Linguistics (ACL), pp. 311–318 (2002)
Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Lavie, A., Denkowski, M.J.: The METEOR metric for automatic evaluation of machine translation. Springer Machine Translation, vol. 23, no. 2, pp. 105–115 (2009)
Anderson, P., Fernando, B., Johnson, M., Gould, S.: SPICE: semantic propositional image caption evaluation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 382–398. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_24
Acknowledgements
This research was supported by the Ministry of Human Resource Development (MHRD) INDIA with reference grant number: 1-3146198040.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, P., Raman, B. (2021). Domain Adaptation Based Technique for Image Emotion Recognition Using Image Captions. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_33
Download citation
DOI: https://doi.org/10.1007/978-981-16-1092-9_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)