Skip to main content
Log in

Incorporating structured emotion commonsense knowledge and interpersonal relation into context-aware emotion recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Traditional emotion recognition technology often focuses on recognizing human biometrics such as facial expressions or body postures. However, psychological research shows that the context (contextual information) also plays an important role in perceiving the emotions of others. Existing research methods, that are based on contextual information, have heavily relied on the semantic features of images. They do not take into account the interrelationships between objects and fail to use external knowledge. Meanwhile, external knowledge is likely to be very helpful in perceiving emotion. In this paper, by incorporating external structured emotion commonsense knowledge, two methods are proposed for constructing emotion knowledge graphs based on the objective text of images, and a multi-modal emotion recognition model is designed. The model has three branches, one of which focuses on human biometrics, and another two branches employ emotion knowledge graphs to perceive emotion from contextual information. Before constructing the emotion knowledge graphs, we convert the visual content into the text information to obtain the prime and ample contextual information from the object, scene, and the relationship between the objects. This approach can reduce redundant and invalid information. After that, the structured emotion commonsense knowledge is integrated into the objective text in word sharing. A large-scale emotion knowledge graph based on all valid words (LEKG) and a small-scale emotion knowledge graph based on the document itself (TEKG) are constructed, respectively. We propose two fusion modules, one of which is attention-based, and the other is a deep reasoning module that incorporates interpersonal relation. We conduct extensive experiments on the benchmark dataset EMOTIC. The experimental results prove that our method is superior to the most advanced methods, and it has obvious advantages in global context-aware tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from http://sunai.uoc.edu/emotic/ but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of corresponding author of the EMOTIC dataset.

References

  1. Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009https://ieeexplore.ieee.org/document/7968387/

    Article  Google Scholar 

  2. Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3128–3137

  3. Barrett L F, Mesquita B, Gendron M (2011) Context in emotion perception. Curr Dir Psychol Sci 20(5):286–290

    Article  Google Scholar 

  4. Barrett L F (2017) How emotions are made: The secret life of the brain, Book, Houghton Mifflin Harcourt

  5. Kosti R, Alvarez J M, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1960–1968

  6. Kosti R, Alvarez J M, Recasens A, Lapedriza A (2020) Context based emotion recognition using emotic dataset. IEEE Trans Pattern Anal Mach Intell 42(11):2755–2766. https://doi.org/10.1109/TPAMI.2019.2916866. https://www.ncbi.nlm.nih.gov/pubmed/31095475

    Google Scholar 

  7. Mittal T, Guhan P, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emoticon: Context-aware multimodal emotion recognition using freges principle. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14222–14231

  8. Zhang M, Liang Y, Ma H (2019) Context-aware affective graph reasoning for emotion recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp 151–156

  9. Hall E T (1963) A system for the notation of proxemic behavior1. Am Anthropol 65(5):1003–1026

    Article  Google Scholar 

  10. Sommer R (1959) Studies in personal space. Sociometry 22(3):247–260

    Article  Google Scholar 

  11. Kendon A (1990) Conducting interaction: Patterns of behavior in focused encounters, Book. Conducting interaction: Patterns of behavior in focused encounters. Cambridge University Press, New York, NY, US

    Google Scholar 

  12. Yang P, Li L, Luo F, Liu T, Sun X (2019) Enhancing topic-to-essay generation with external commonsense knowledge. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2002–2012

  13. Zhong P, Wang D, Miao. C (2019) Knowledge-enriched transformer for emotion detection in textual conversations.. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), p 165176

  14. Cambria E, Li Y, Xing F Z, Poria S, Kwok K (2020) Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: CIKM ’20: The 29th ACM international conference on information and knowledge management, pp 105–114

  15. Liu Z, Niu Z-Y, Wu H, Wang H (2019) Knowledge aware conversation generation with explainable reasoning over augmented graphs. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), vol Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 1782–1792

  16. Bi B, Wu C, Yan M, Wang W, Xia J, Li C (2019) Incorporating external knowledge into machine reading for generative question answering. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 2521–2530

  17. Lei Z, Yang Y, Yang M (2018) Saan: a sentiment-aware attention network for sentiment analysis. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1197–1200

  18. Margatina K, Baziotis C, Potamianos A (2019) Attention-based conditioning methods for external knowledge integration. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 3944–3951. https://doi.org/10.18653/v1/P19-1385https://doi.org/10.18653/v1/P19-1385. https://aclanthology.org/P19-1385

  19. Bao L, Lambert P, Badia T (2019) Attention and lexicon regularized lstm for aspect-based sentiment analysis. In: Proceedings of the 57th annual meeting of the association for computational linguistics: Student Research Workshop, pp 253–259

  20. Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm. In: Thirty-second AAAI conference on artificial intelligence

  21. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780

    Article  Google Scholar 

  22. Zhang B, Yang M, Li X, Ye Y, Xu X, Dai K (2020) Enhancing cross-target stance detection with transferable semantic-emotion knowledge. In: Proceedings of the 58th annual meeting of the association for computational Linguistics, pp 3188–3197

  23. Ghosal D, Hazarika D, Roy A, Majumder N, Mihalcea R, Poria S (2020) Kingdom: Knowledge-guided domain adaptation for sentiment analysis. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3198– 3210

  24. Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: An open multilingual graph of general knowledge. In: Singh S, Markovitch S (eds) Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972. AAAI Press, pp 4444–4451

  25. Qi F, Yang X, Xu C (2021) Emotion knowledge driven video highlight detection. IEEE Transactions on Multimedia 23:3999–4013. https://doi.org/10.1109/TMM.2020.3035285

    Article  Google Scholar 

  26. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  27. Kipf T N, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  28. Thuseethan S, Rajasegarar S, Yearwood J (2021) Boosting emotion recognition in context using non-target subject information. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–7

  29. Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10575–10584

  30. Chen Z, Wei X S, Wang P, Guo Y (2021) Learning graph convolutional networks for multi-label recognition and applications. https://doi.org/10.1109/TPAMI.2021.3063496

  31. Susanto Y, Livingstone A G, Ng B C, Cambria E (2020) The hourglass model revisited. IEEE Intell Syst 35(5):96–102

    Article  Google Scholar 

  32. Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Cognitive behavioural systems. Springer, pp 144–157

  33. Zhang Y, Yu X, Cui Z, Wu S, Wen Z, Wang L (2020) Every document owns its structure: Inductive text classification via graph neural networks. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 334–339. https://doi.org/10.18653/v1/2020.acl-main.31https://doi.org/10.18653/v1/2020.acl-main.31

  34. Yujia L, Tarlow D, Brockschmidt M, Zemel RS (2016) Gated graph sequence neural networks. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference track proceedings. arXiv:1511.05493

  35. Cao Z, Hidalgo G, Simon T, Wei S E, Sheikh Y (2021) Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257

    Article  Google Scholar 

  36. Wang Z, Chen T, Ren JSJ, Yu W, Cheng H, Lin L (2018) Deep reasoning with knowledge graph for social relationship understanding. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp 1021–1028. https://doi.org/10.24963/ijcai.2018/142

  37. Pennington J, Socher R, Manning C D (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  38. Bhattacharya U, Mittal T, Chandra R, Randhavane T, Manocha D (2020) Step: Spatial temporal graph convolutional networks for emotion perception from gaits. Proceedings of the AAAI Conference on Artificial Intelligence 34(2):1342–1350

    Article  Google Scholar 

  39. Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J, Zhang Z (2019) Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv:1909.01315

  40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  41. Hamilton W L, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 1025–1035

  42. Velikovi P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference track proceedings. https://openreview.net/forum?id=rJXMpikCZ. OpenReview.net

Download references

Acknowledgements

This study is supported by the National Natural Science Foundation of China (No. 61573114), and the Science and Technology on Underwater Test and Control Laboratory under Grant (YS24071804)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kejun Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Yang, T., Huang, Z. et al. Incorporating structured emotion commonsense knowledge and interpersonal relation into context-aware emotion recognition. Appl Intell 53, 4201–4217 (2023). https://doi.org/10.1007/s10489-022-03729-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03729-4

Keywords

Navigation