Abstract
Aspect-level multimodal sentiment analysis is a target oriented fine-grained sentiment analysis task aimed at determining the sentiment polarity of a given aspect of a sentence in conjunction with relevant multimodal data. Multimodal alignment and fusion remains a challenge for this task, and this paper proposes to solve this issue by considering the inter-modal local interactions. Therefore, a co-attention guided local-global feature fusion (CLGFF) method is proposed. The CLGFF method mines both aspect-guided global multimodal features and local fine-grained alignment between multimodalities, and then fuses them together for better exploitation of the global-local semantic correlation. A large number of experiments are carried out on two aspect-level multimodal sentiment datasets. A series of methods are compared from the experiments, and the results show that the proposed CLGFF method can better capture the local semantic correlation within the modality and the fine-grained consistency between different modalities, thereby improving the performance of aspect-level multimodal sentiment analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Truong, Q.T., Lauw, H.W.: VistaNet: visual aspect attention network for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 305–312 (2019)
Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 371–378 (2019)
Yu, J., Jiang, J., Xia, R.: Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 429–439 (2019)
Gu, D., Wang, J., Cai, S.: Targeted aspect-based multimodal sentiment analysis: an attention capsule extraction and multi-head fusion network. IEEE Access 9, 157329–157336 (2021)
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932 (2018)
Nemati, S., Rohani, R., Basiri, M.E.: A hybrid latent space data fusion method for multimodal emotion recognition. IEEE Access 7, 172948–172964 (2019)
Yu, Y., Lin, H., Meng, J.: Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms 9(2), 41 (2016)
Kumar, A., Srinivasan, K., Cheng, W.H.: Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manag. 57(1), 102141 (2020)
Chen, F., Gao, Y., Cao, D.: Multimodal hypergraph learning for microblog sentiment prediction. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)
Xu, J., Huang, F., Zhang, X.: Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl. Soft Comput. 80, 387–399 (2019)
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(4) (2018)
Dong, L., Wei, F., Tan, C.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 49–54 (2014)
Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3298–3307 (2016)
Li, R., Chen, H., Feng, F., Ma, Z., Wang, X., Hovy, E.: Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 6319–6329 (2021)
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 214–224 (2016)
Chen, P., Sun, Z., Bing, L.: Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 452–461 (2017)
Zhao, S., et al.: An end-to-end visual-audio attention network for emotion recognition in user-generated videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 303–311 (2020)
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10790–10797 (2021)
Zhang, Z., Yang, J.: Temporal sentiment localization: listen and look in untrimmed videos. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 199–208 (2022)
Zhang, Z., Wang, L., Yang, J.: Weakly supervised video emotion detection and prediction via cross-modal temporal erasing network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18888–18897 (2023)
Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 5408–5414 (2019)
Khan, Z., Fu, Y.: Exploiting BERT for multimodal target sentiment classification through input space translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3034–3042 (2021)
Yang, L., Na, J.-C., Yu, J.: Cross-modal multi task transformer for end-to-end multimodal aspect based sentiment analysis. Inf. Process. Manag. 59(5) (2022)
Jia, L., Ma, T., Rong, H., Al-Nabhan, N.: Affective region recognition and fusion network for target-level multimodal sentiment classification. IEEE Trans. Emerg. Top. Comput. 0(1) (2023)
He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Chen, C., Han, D., Chang, C.C.: CAAN: context-aware attention network for visual question answering. Pattern Recogn. 132 (2022)
Liu, Y., Liu, H., Wang, H., Meng, F., Liu, M.: BCAN: bidirectional correct attention network for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Fan, F., Feng, Y., Zhao, D.: Multi-grained attention network for aspect-level sentiment classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3433–3442 (2018)
Hazarika, D., Poria, S., Zadeh, A., Cambria, E., Morency, L.P., Zimmermann, R.: Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, p. 2122 (2018)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62366010), The Key lab of Trusted software (k202060) and CCF-Zhipu AI Large Model Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cai, G., Wang, S., Lv, G. (2024). Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_30
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)