Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model

Zhang, Yazhou; Rong, Lu; Li, Xiang; Chen, Rui

doi:10.1007/978-3-030-99736-6_35

Yazhou Zhang¹⁵,
Lu Rong¹⁵,
Xiang Li¹⁶ &
…
Rui Chen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Included in the following conference series:

European Conference on Information Retrieval

2770 Accesses
2 Citations

Abstract

Emotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. They are tightly entangled with each other in that one helps the understanding of the other, leading to a new research topic, i.e., multi-modal sentiment and emotion joint analysis. There exists two key challenges in this field, i.e., multi-modal fusion and multi-task interaction. Most of the recent approaches treat them as two independent tasks, and fail to model the relationships between them. In this paper, we propose a novel multi-modal multi-task learning model, termed MMT, to generically address such issues. Specially, two attention mechanisms, i.e., cross-modal and cross-task attentions are designed. Cross-modal attention is proposed to model multi-modal feature fusion, while cross-task attention is to capture the interaction between sentiment analysis and emotion recognition. Finally, we empirically show that this method alleviates such problems on two benchmarking datasets, while getting better performance for the main task, i.e., sentiment analysis with the help of the secondary emotion recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Akhtar, M.S., Chauhan, D.S., Ekbal, A.: A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Trans. Knowl. Discov. Data (TKDD) 14(3), 1–27 (2020)
Article Google Scholar
Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv preprint arXiv:1905.05812 (2019)
Cambria, E., Poria, S., Hussain, A.: Speaker-independent multimodal sentiment analysis for big data. In: Seng, K.P., Ang, L., Liew, A.W.-C., Gao, J. (eds.) Multimodal Analytics for Next-Generation Big Data Technologies and Applications, pp. 13–43. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97598-6_2
Chapter Google Scholar
Chauhan, D.S., Dhanush, S., Ekbal, A., Bhattacharyya, P.: Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4351–4360 (2020)
Google Scholar
Chauhan, D.S., Dhanush, S.R., Ekbal, A., Bhattacharyya, P.: Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4351–4360. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.401. https://www.aclweb.org/anthology/2020.acl-main.401
Chuang, Z.J., Wu, C.H.: Multi-modal emotion recognition from speech and text. Int. J. Comput. Linguist. Chinese Lang. Process. 9(2), 45–62 (2004). Special Issue on New Trends of Speech and Language Processing
Google Scholar
Datcu, D., Rothkrantz, L.J.: Semantic audio-visual data fusion for automatic emotion recognition. In: Emotion Recognition: A Pattern Analysis Approach, pp. 411–435 (2014)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.N.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2018)
Google Scholar
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.F.: DialogueGCN: a graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 154–164 (2019)
Google Scholar
Huang, F., Zhang, X., Zhao, Z., Xu, J., Li, Z.: Image-text sentiment analysis via deep multimodal attentive fusion. Knowl. Based Syst. 167, 26–37 (2019)
Article Google Scholar
Kumar, A., Garg, G.: Sentiment analysis of multimodal twitter data. Multimedia Tools Appl. 78, 1–17 (2019)
Article Google Scholar
Li, Q., Melucci, M.: Quantum-inspired multimodal representation. In: 10th Italian Information Retrieval Workshop, pp. 1–2 (2019)
Google Scholar
Liu, Y., Zhang, Y., Li, Q., Wang, B., Song, D.: What does your smile mean? Jointly detecting multi-modal sarcasm and sentiment using quantum probability. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 871–880 (2021)
Google Scholar
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: DialogueRNN: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6818–6825 (2019)
Google Scholar
Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 169–176. ACM (2011)
Google Scholar
Munezero, M., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014)
Article Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Article Google Scholar
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: a multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 527–536 (2019)
Google Scholar
Potamias, R.A., Siolas, G., Stafylopatis, A.G.: A transformer-based approach to irony and sarcasm detection. Neural Comput. Appl. 32(23), 17309–17320 (2020). https://doi.org/10.1007/s00521-020-05102-3
Article Google Scholar
Sahu, S., Mitra, V., Seneviratne, N., Espy-Wilson, C.Y.: Multi-modal learning for speech emotion recognition: an analysis and comparison of ASR outputs with ground truth transcription. In: Interspeech, pp. 3302–3306 (2019)
Google Scholar
Sharma, C., et al.: SemEval-2020 task 8: memotion analysis-the visuo-lingual metaphor! In: Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), Barcelona, Spain. Association for Computational Linguistics, September 2020
Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Google Scholar
Tian, D., Zhou, D., Gong, M., Wei, Y.: Interval type-2 fuzzy logic for semisupervised multimodal hashing. IEEE Trans. Cybern. 51, 3802–3812 (2019)
Article Google Scholar
Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, p. 6558. NIH Public Access (2019)
Google Scholar
Vlad, G.A., Zaharia, G.E., Cercel, D.C., Chiru, C.G., Trausan-Matu, S.: UPB at SemEval-2020 task 8: joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1208–1214 (2020)
Google Scholar
Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 371–378 (2019)
Google Scholar
Yu, W., et al.: CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3718–3727. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.343. https://www.aclweb.org/anthology/2020.acl-main.343
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1103–1114. Association for Computational Linguistics, September 2017
Google Scholar
Zhang, Y., Li, Q., Song, D., Zhang, P., Wang, P.: Quantum-inspired interactive networks for conversational sentiment analysis. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI 2019, pp. 5436–5442. International Joint Conferences on Artificial Intelligence Organization, July 2019. https://doi.org/10.24963/ijcai.2019/755
Zhang, Y., et al.: CFN: a complex-valued fuzzy network for sarcasm detection in conversations. IEEE Trans. Fuzzy Syst. 29, 3696–3710 (2021)
Article Google Scholar
Zhang, Y., et al.: A quantum-inspired multimodal sentiment analysis framework. Theoret. Comput. Sci. 752, 21–40 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgment

This work is supported by National Science Foundation of China under grant No. 62006212, the fund of State Key Lab. for Novel Software Technology in Nanjing University under grant No. KFKT2021B41, the Industrial Science and Technology Research Project of Henan Province under Grants 222102210031, 212102210418, 212102310088, the Doctoral Scientific Research Foundation of Zhengzhou Univ. of Light Industry (grant No 2020BSJJ030, 2020BSJJ031).

Author information

Authors and Affiliations

Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, China
Yazhou Zhang, Lu Rong & Rui Chen
Shandong Computer Science Center (National Supercomputing Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Xiang Li

Authors

Yazhou Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Rong
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Rui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lu Rong or Rui Chen .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Rong, L., Li, X., Chen, R. (2022). Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-99736-6_35
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics