Skip to main content

Zero-Shot Visual Emotion Recognition by Exploiting BERT

  • Conference paper
  • First Online:
Book cover Intelligent Systems and Applications (IntelliSys 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 543))

Included in the following conference series:

  • 732 Accesses

Abstract

The explosive growth of multimedia has attracted many people to express their opinions through social media like Flickr and Facebook. As a result, social media has become the rich source of data for analyzing human emotions. Many earlier studies have been conducted to automatically assess human emotions due to their wide range of applications such as education, advertisement, and entertainment. Recently, many researchers have been focusing on visual contents to find out clues for evoking emotions. In literature, this type of study is called visual sentiment analysis. Although a great performance has been achieved by many earlier studies on visual emotion analysis, most of them are limited to classification tasks with pre-determined emotion categories. In this paper, we aim to recognize emotion classes that do not exist in the training set. The proposed model is trained by mapping the visual features to the emotional semantic representation embedded by the BERT language model. By evaluating the model on a cross-domain affective dataset, we achieved 66% accuracy for predicting the unseen emotions not included in the training set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhao, S., Gao, Y., Ding, G., Chua, T.: Real-time multimedia social event detection in microblog. IEEE Trans. Cybern. 48(11), 3218–3231 (2017)

    Article  Google Scholar 

  2. Yang, J., She, D., Lai, Y., Rosin, P. L., Yang M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. IEEE, Salt Lake City, UT, USA (2018)

    Google Scholar 

  3. Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92. Association for Computing Machinery, Firenze Italy (2010)

    Google Scholar 

  4. Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 223–232. Association for Computing Machinery, Barcelona Spain (2013)

    Google Scholar 

  5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  6. Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.: A mixed bag of emotions: model, predict, and transfer emotion distributions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–868. IEEE, Boston, MA, USA (2015)

    Google Scholar 

  7. You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 308–314. AAAI Press, Phoenix Arizona (2016)

    Google Scholar 

  8. Yang, J., She, D., Lai, Y.K., Yang, M. H.: Retrieving and classifying affective images via deep metric learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  9. Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)

    Article  Google Scholar 

  10. Ekman, P., et al.: Universals and cultural differences in the judgments of facial expressions of emotion. J. Person. Soc. Psychol. 53(4), 712–717 (1987)

    Google Scholar 

  11. Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37, 626–630 (2005)

    Article  Google Scholar 

  12. Zhan, C., She, D., Zhao, S., Cheng, M.M., Yang J.: Zero-shot emotion recognition via affective structural embedding. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1151–1160. Seoul, Korea (South) (2019)

    Google Scholar 

  13. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representations. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  14. Devlin, J., Chang, M.W., Lee K., TouTanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019)

    Google Scholar 

  15. Hazarika, D., Zimmermann, R., Poria, S.: MISA: modality-invariant and -specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131 (2020)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016)

    Google Scholar 

  17. Machajdik, J., Hanbury A.: Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 83–92 (2010)

    Google Scholar 

  18. Zhao, S., Yao, H., Gao, Y., Ding, G., Chua, T.S.: Predicting personalized image emotion perceptions in social networks. IEEE Trans. Affect. Comput. 9(4), 526–540 (2018)

    Article  Google Scholar 

  19. Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3266–3272. AAAI Press, Melbourne Australia (2017)

    Google Scholar 

  20. Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T.S., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 47–56. Association for Computing Machinery, Orlando Florida USA (2014)

    Google Scholar 

  21. Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 652–662 (2021)

    Article  Google Scholar 

  22. Chen, T., Borth, D., Darrell, T., Chang, S.F.: DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv preprint arXiv:1410.8586 (2014)

  23. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. AAAI Press, Austin Texas (2015)

    Google Scholar 

  24. You, Q., Jin, H., Luo, J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, vol. 31, issue 1, pp. 231–237 (2017)

    Google Scholar 

  25. Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning – the good, the bad and the ugly. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4582–4591. IEEE, Honolulu, HI, USA (2017)

    Google Scholar 

  26. Huang, S., Elhoseiny, M., Elgammal, A., Yang, D.: Learning hypergraph-regularized attribute predictors. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–417 (2015)

    Google Scholar 

  27. Lampert, H.C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)

    Article  Google Scholar 

  28. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: Proceedings of the 23rd national conference on Artificial intelligence, pp. 646–651. AAAI Press, Chicago Illinois (2008)

    Google Scholar 

  29. Rohrbach, M., Stark, M., Schiele, B.: Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1641–1648. IEEE, Colorado Springs, CO, USA (2011)

    Google Scholar 

  30. Yu, X., Aloimonos, Y.: Attribute-based transfer learning for object categorization with zero/one training example. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 127–140. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_10

    Chapter  Google Scholar 

  31. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE, Miami, FL, USA (2009)

    Google Scholar 

  32. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)

    Google Scholar 

  33. Jayaraman, D., Grauman, K.: Zero shot recognition with unreliable attributes. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3464–3472. MIT Press, 55 Hayward St. Cambridge MA United States (2014)

    Google Scholar 

  34. Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I. Schiele, B.: What helps where – and why? Semantic relatedness for knowledge transfer. The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, pp. 910–917. IEEE, San Francisco, CA, USA (2010)

    Google Scholar 

  35. Rohrbach, M., Ebert, S., Schiele B.: Transfer learning in a transductive setting. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 46–54. Curran Associates Inc., Lake Tahoe Nevada (2013)

    Google Scholar 

  36. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 819–826. IEEE, Portland, OR, USA (2013)

    Google Scholar 

  37. Weston, J., Bengio, S., Usunier, N.: Large scale image annotation: learning to rank with joint word-image embeddings. Mach. Learn. 81, 21–35 (2010)

    Article  MathSciNet  Google Scholar 

  38. Fu, Z., Xiang, T.A., Kodirov, E., Gong, S.: Zero-shot object recognition by semantic manifold distance. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, USA (2015)

    Google Scholar 

  39. Fu, Y., Sigal, L.: Semi-supervised vocabulary-informed learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5337–5346. IEEE, Las Vegas, NV, USA (2016)

    Google Scholar 

  40. Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 1270–1278. Curran Associates Inc., Barcelona Spain (2016)

    Google Scholar 

  41. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3174–3183. IEEE, Honolulu, HI, USA (2017)

    Google Scholar 

  42. Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5327–5336 (2016)

    Google Scholar 

  43. Changpinyo, S., Chao, W.L., Sha, F.: Predicting visual exemplars of unseen classes for zero-shot learning. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3476–3485. IEEE, Venice, Italy (2017)

    Google Scholar 

  44. Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile (2015)

    Google Scholar 

  45. Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6034–6042. IEEE, Las Vegas, NV, USA (2016)

    Google Scholar 

  46. Socher, R., Ganjoo, M., Manning, C.D., Andrew, N.G.: Zero-shot learning through cross-modal transfer. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 935–943. Curran Associates Inc., Lake Tahoe Nevada (2013)

    Google Scholar 

  47. Frome, A., et al.: DeVise: A deep visual-semantic embedding model. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 2121–2129. Curran Associates Inc., 57 Morehouse Lane Red Hook NY United States (2013)

    Google Scholar 

  48. Norouzi, M., et al.: Zero-Shot Learning by Convex Combination of Semantic Embeddings, arXiv preprint arXiv:1312.5650 (2013)

  49. Chae, J., Zimmermann, R., Kim, D., Kim, J.: Attentive transfer learning via self-supervised learning for cervical dysplasia diagnosis. J. Inf. Process. Syst. 17(3), 453–461 (2021)

    Google Scholar 

Download references

Acknowledgment

“This research was supported by the MSIT (Ministry of Science, ICT), Korea, under the High-Potential Individuals Global Training Program) (2021-0-01549) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihie Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, H., Hazarika, D., Kim, D., Kim, J. (2023). Zero-Shot Visual Emotion Recognition by Exploiting BERT. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 543. Springer, Cham. https://doi.org/10.1007/978-3-031-16078-3_33

Download citation

Publish with us

Policies and ethics