Abstract
Emotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. They are tightly entangled with each other in that one helps the understanding of the other, leading to a new research topic, i.e., multi-modal sentiment and emotion joint analysis. There exists two key challenges in this field, i.e., multi-modal fusion and multi-task interaction. Most of the recent approaches treat them as two independent tasks, and fail to model the relationships between them. In this paper, we propose a novel multi-modal multi-task learning model, termed MMT, to generically address such issues. Specially, two attention mechanisms, i.e., cross-modal and cross-task attentions are designed. Cross-modal attention is proposed to model multi-modal feature fusion, while cross-task attention is to capture the interaction between sentiment analysis and emotion recognition. Finally, we empirically show that this method alleviates such problems on two benchmarking datasets, while getting better performance for the main task, i.e., sentiment analysis with the help of the secondary emotion recognition task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akhtar, M.S., Chauhan, D.S., Ekbal, A.: A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Trans. Knowl. Discov. Data (TKDD) 14(3), 1–27 (2020)
Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. arXiv preprint arXiv:1905.05812 (2019)
Cambria, E., Poria, S., Hussain, A.: Speaker-independent multimodal sentiment analysis for big data. In: Seng, K.P., Ang, L., Liew, A.W.-C., Gao, J. (eds.) Multimodal Analytics for Next-Generation Big Data Technologies and Applications, pp. 13–43. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97598-6_2
Chauhan, D.S., Dhanush, S., Ekbal, A., Bhattacharyya, P.: Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4351–4360 (2020)
Chauhan, D.S., Dhanush, S.R., Ekbal, A., Bhattacharyya, P.: Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4351–4360. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.401. https://www.aclweb.org/anthology/2020.acl-main.401
Chuang, Z.J., Wu, C.H.: Multi-modal emotion recognition from speech and text. Int. J. Comput. Linguist. Chinese Lang. Process. 9(2), 45–62 (2004). Special Issue on New Trends of Speech and Language Processing
Datcu, D., Rothkrantz, L.J.: Semantic audio-visual data fusion for automatic emotion recognition. In: Emotion Recognition: A Pattern Analysis Approach, pp. 411–435 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.N.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2018)
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.F.: DialogueGCN: a graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 154–164 (2019)
Huang, F., Zhang, X., Zhao, Z., Xu, J., Li, Z.: Image-text sentiment analysis via deep multimodal attentive fusion. Knowl. Based Syst. 167, 26–37 (2019)
Kumar, A., Garg, G.: Sentiment analysis of multimodal twitter data. Multimedia Tools Appl. 78, 1–17 (2019)
Li, Q., Melucci, M.: Quantum-inspired multimodal representation. In: 10th Italian Information Retrieval Workshop, pp. 1–2 (2019)
Liu, Y., Zhang, Y., Li, Q., Wang, B., Song, D.: What does your smile mean? Jointly detecting multi-modal sarcasm and sentiment using quantum probability. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 871–880 (2021)
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: DialogueRNN: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6818–6825 (2019)
Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 169–176. ACM (2011)
Munezero, M., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014)
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inf. Fusion 37, 98–125 (2017)
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: a multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 527–536 (2019)
Potamias, R.A., Siolas, G., Stafylopatis, A.G.: A transformer-based approach to irony and sarcasm detection. Neural Comput. Appl. 32(23), 17309–17320 (2020). https://doi.org/10.1007/s00521-020-05102-3
Sahu, S., Mitra, V., Seneviratne, N., Espy-Wilson, C.Y.: Multi-modal learning for speech emotion recognition: an analysis and comparison of ASR outputs with ground truth transcription. In: Interspeech, pp. 3302–3306 (2019)
Sharma, C., et al.: SemEval-2020 task 8: memotion analysis-the visuo-lingual metaphor! In: Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), Barcelona, Spain. Association for Computational Linguistics, September 2020
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Tian, D., Zhou, D., Gong, M., Wei, Y.: Interval type-2 fuzzy logic for semisupervised multimodal hashing. IEEE Trans. Cybern. 51, 3802–3812 (2019)
Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, p. 6558. NIH Public Access (2019)
Vlad, G.A., Zaharia, G.E., Cercel, D.C., Chiru, C.G., Trausan-Matu, S.: UPB at SemEval-2020 task 8: joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, pp. 1208–1214 (2020)
Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 371–378 (2019)
Yu, W., et al.: CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3718–3727. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.343. https://www.aclweb.org/anthology/2020.acl-main.343
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1103–1114. Association for Computational Linguistics, September 2017
Zhang, Y., Li, Q., Song, D., Zhang, P., Wang, P.: Quantum-inspired interactive networks for conversational sentiment analysis. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI 2019, pp. 5436–5442. International Joint Conferences on Artificial Intelligence Organization, July 2019. https://doi.org/10.24963/ijcai.2019/755
Zhang, Y., et al.: CFN: a complex-valued fuzzy network for sarcasm detection in conversations. IEEE Trans. Fuzzy Syst. 29, 3696–3710 (2021)
Zhang, Y., et al.: A quantum-inspired multimodal sentiment analysis framework. Theoret. Comput. Sci. 752, 21–40 (2018)
Acknowledgment
This work is supported by National Science Foundation of China under grant No. 62006212, the fund of State Key Lab. for Novel Software Technology in Nanjing University under grant No. KFKT2021B41, the Industrial Science and Technology Research Project of Henan Province under Grants 222102210031, 212102210418, 212102310088, the Doctoral Scientific Research Foundation of Zhengzhou Univ. of Light Industry (grant No 2020BSJJ030, 2020BSJJ031).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Rong, L., Li, X., Chen, R. (2022). Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-99736-6_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)