Abstract
Traditional emotion recognition technology often focuses on recognizing human biometrics such as facial expressions or body postures. However, psychological research shows that the context (contextual information) also plays an important role in perceiving the emotions of others. Existing research methods, that are based on contextual information, have heavily relied on the semantic features of images. They do not take into account the interrelationships between objects and fail to use external knowledge. Meanwhile, external knowledge is likely to be very helpful in perceiving emotion. In this paper, by incorporating external structured emotion commonsense knowledge, two methods are proposed for constructing emotion knowledge graphs based on the objective text of images, and a multi-modal emotion recognition model is designed. The model has three branches, one of which focuses on human biometrics, and another two branches employ emotion knowledge graphs to perceive emotion from contextual information. Before constructing the emotion knowledge graphs, we convert the visual content into the text information to obtain the prime and ample contextual information from the object, scene, and the relationship between the objects. This approach can reduce redundant and invalid information. After that, the structured emotion commonsense knowledge is integrated into the objective text in word sharing. A large-scale emotion knowledge graph based on all valid words (LEKG) and a small-scale emotion knowledge graph based on the document itself (TEKG) are constructed, respectively. We propose two fusion modules, one of which is attention-based, and the other is a deep reasoning module that incorporates interpersonal relation. We conduct extensive experiments on the benchmark dataset EMOTIC. The experimental results prove that our method is superior to the most advanced methods, and it has obvious advantages in global context-aware tasks.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available from http://sunai.uoc.edu/emotic/ but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of corresponding author of the EMOTIC dataset.
References
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2018) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009https://ieeexplore.ieee.org/document/7968387/
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3128–3137
Barrett L F, Mesquita B, Gendron M (2011) Context in emotion perception. Curr Dir Psychol Sci 20(5):286–290
Barrett L F (2017) How emotions are made: The secret life of the brain, Book, Houghton Mifflin Harcourt
Kosti R, Alvarez J M, Recasens A, Lapedriza A (2017) Emotion recognition in context. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1960–1968
Kosti R, Alvarez J M, Recasens A, Lapedriza A (2020) Context based emotion recognition using emotic dataset. IEEE Trans Pattern Anal Mach Intell 42(11):2755–2766. https://doi.org/10.1109/TPAMI.2019.2916866. https://www.ncbi.nlm.nih.gov/pubmed/31095475
Mittal T, Guhan P, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emoticon: Context-aware multimodal emotion recognition using freges principle. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 14222–14231
Zhang M, Liang Y, Ma H (2019) Context-aware affective graph reasoning for emotion recognition. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp 151–156
Hall E T (1963) A system for the notation of proxemic behavior1. Am Anthropol 65(5):1003–1026
Sommer R (1959) Studies in personal space. Sociometry 22(3):247–260
Kendon A (1990) Conducting interaction: Patterns of behavior in focused encounters, Book. Conducting interaction: Patterns of behavior in focused encounters. Cambridge University Press, New York, NY, US
Yang P, Li L, Luo F, Liu T, Sun X (2019) Enhancing topic-to-essay generation with external commonsense knowledge. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2002–2012
Zhong P, Wang D, Miao. C (2019) Knowledge-enriched transformer for emotion detection in textual conversations.. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), p 165176
Cambria E, Li Y, Xing F Z, Poria S, Kwok K (2020) Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: CIKM ’20: The 29th ACM international conference on information and knowledge management, pp 105–114
Liu Z, Niu Z-Y, Wu H, Wang H (2019) Knowledge aware conversation generation with explainable reasoning over augmented graphs. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), vol Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 1782–1792
Bi B, Wu C, Yan M, Wang W, Xia J, Li C (2019) Incorporating external knowledge into machine reading for generative question answering. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Association for Computational Linguistics, pp 2521–2530
Lei Z, Yang Y, Yang M (2018) Saan: a sentiment-aware attention network for sentiment analysis. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1197–1200
Margatina K, Baziotis C, Potamianos A (2019) Attention-based conditioning methods for external knowledge integration. In: Proceedings of the 57th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Florence, Italy, pp 3944–3951. https://doi.org/10.18653/v1/P19-1385https://doi.org/10.18653/v1/P19-1385. https://aclanthology.org/P19-1385
Bao L, Lambert P, Badia T (2019) Attention and lexicon regularized lstm for aspect-based sentiment analysis. In: Proceedings of the 57th annual meeting of the association for computational linguistics: Student Research Workshop, pp 253–259
Ma Y, Peng H, Cambria E (2018) Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive lstm. In: Thirty-second AAAI conference on artificial intelligence
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural comput 9(8):1735–1780
Zhang B, Yang M, Li X, Ye Y, Xu X, Dai K (2020) Enhancing cross-target stance detection with transferable semantic-emotion knowledge. In: Proceedings of the 58th annual meeting of the association for computational Linguistics, pp 3188–3197
Ghosal D, Hazarika D, Roy A, Majumder N, Mihalcea R, Poria S (2020) Kingdom: Knowledge-guided domain adaptation for sentiment analysis. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3198– 3210
Speer R, Chin J, Havasi C (2017) Conceptnet 5.5: An open multilingual graph of general knowledge. In: Singh S, Markovitch S (eds) Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972. AAAI Press, pp 4444–4451
Qi F, Yang X, Xu C (2021) Emotion knowledge driven video highlight detection. IEEE Transactions on Multimedia 23:3999–4013. https://doi.org/10.1109/TMM.2020.3035285
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Kipf T N, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907
Thuseethan S, Rajasegarar S, Yearwood J (2021) Boosting emotion recognition in context using non-target subject information. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–7
Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10575–10584
Chen Z, Wei X S, Wang P, Guo Y (2021) Learning graph convolutional networks for multi-label recognition and applications. https://doi.org/10.1109/TPAMI.2021.3063496
Susanto Y, Livingstone A G, Ng B C, Cambria E (2020) The hourglass model revisited. IEEE Intell Syst 35(5):96–102
Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Cognitive behavioural systems. Springer, pp 144–157
Zhang Y, Yu X, Cui Z, Wu S, Wen Z, Wang L (2020) Every document owns its structure: Inductive text classification via graph neural networks. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 334–339. https://doi.org/10.18653/v1/2020.acl-main.31https://doi.org/10.18653/v1/2020.acl-main.31
Yujia L, Tarlow D, Brockschmidt M, Zemel RS (2016) Gated graph sequence neural networks. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference track proceedings. arXiv:1511.05493
Cao Z, Hidalgo G, Simon T, Wei S E, Sheikh Y (2021) Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans Pattern Anal Mach Intell 43(1):172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Wang Z, Chen T, Ren JSJ, Yu W, Cheng H, Lin L (2018) Deep reasoning with knowledge graph for social relationship understanding. In: Lang J (ed) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp 1021–1028. https://doi.org/10.24963/ijcai.2018/142
Pennington J, Socher R, Manning C D (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Bhattacharya U, Mittal T, Chandra R, Randhavane T, Manocha D (2020) Step: Spatial temporal graph convolutional networks for emotion perception from gaits. Proceedings of the AAAI Conference on Artificial Intelligence 34(2):1342–1350
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J, Zhang Z (2019) Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv:1909.01315
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hamilton W L, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 1025–1035
Velikovi P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference track proceedings. https://openreview.net/forum?id=rJXMpikCZ. OpenReview.net
Acknowledgements
This study is supported by the National Natural Science Foundation of China (No. 61573114), and the Science and Technology on Underwater Test and Control Laboratory under Grant (YS24071804)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, J., Yang, T., Huang, Z. et al. Incorporating structured emotion commonsense knowledge and interpersonal relation into context-aware emotion recognition. Appl Intell 53, 4201–4217 (2023). https://doi.org/10.1007/s10489-022-03729-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03729-4