Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis

Cai, Guoyong; Wang, Shunjie; Lv, Guangrui

doi:10.1007/978-981-99-8429-9_30

Guoyong Cai¹⁵,
Shunjie Wang¹⁵ &
Guangrui Lv¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

999 Accesses

Abstract

Aspect-level multimodal sentiment analysis is a target oriented fine-grained sentiment analysis task aimed at determining the sentiment polarity of a given aspect of a sentence in conjunction with relevant multimodal data. Multimodal alignment and fusion remains a challenge for this task, and this paper proposes to solve this issue by considering the inter-modal local interactions. Therefore, a co-attention guided local-global feature fusion (CLGFF) method is proposed. The CLGFF method mines both aspect-guided global multimodal features and local fine-grained alignment between multimodalities, and then fuses them together for better exploitation of the global-local semantic correlation. A large number of experiments are carried out on two aspect-level multimodal sentiment datasets. A series of methods are compared from the experiments, and the results show that the proposed CLGFF method can better capture the local semantic correlation within the modality and the fine-grained consistency between different modalities, thereby improving the performance of aspect-level multimodal sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Truong, Q.T., Lauw, H.W.: VistaNet: visual aspect attention network for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 305–312 (2019)
Google Scholar
Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 371–378 (2019)
Google Scholar
Yu, J., Jiang, J., Xia, R.: Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 429–439 (2019)
Article Google Scholar
Gu, D., Wang, J., Cai, S.: Targeted aspect-based multimodal sentiment analysis: an attention capsule extraction and multi-head fusion network. IEEE Access 9, 157329–157336 (2021)
Article Google Scholar
Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932 (2018)
Google Scholar
Nemati, S., Rohani, R., Basiri, M.E.: A hybrid latent space data fusion method for multimodal emotion recognition. IEEE Access 7, 172948–172964 (2019)
Article Google Scholar
Yu, Y., Lin, H., Meng, J.: Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms 9(2), 41 (2016)
Article MathSciNet Google Scholar
Kumar, A., Srinivasan, K., Cheng, W.H.: Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manag. 57(1), 102141 (2020)
Article Google Scholar
Chen, F., Gao, Y., Cao, D.: Multimodal hypergraph learning for microblog sentiment prediction. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)
Google Scholar
Xu, J., Huang, F., Zhang, X.: Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl. Soft Comput. 80, 387–399 (2019)
Article Google Scholar
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(4) (2018)
Google Scholar
Dong, L., Wei, F., Tan, C.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 49–54 (2014)
Google Scholar
Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3298–3307 (2016)
Google Scholar
Li, R., Chen, H., Feng, F., Ma, Z., Wang, X., Hovy, E.: Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 6319–6329 (2021)
Google Scholar
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 214–224 (2016)
Google Scholar
Chen, P., Sun, Z., Bing, L.: Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 452–461 (2017)
Google Scholar
Zhao, S., et al.: An end-to-end visual-audio attention network for emotion recognition in user-generated videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 303–311 (2020)
Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
Google Scholar
Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10790–10797 (2021)
Google Scholar
Zhang, Z., Yang, J.: Temporal sentiment localization: listen and look in untrimmed videos. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 199–208 (2022)
Google Scholar
Zhang, Z., Wang, L., Yang, J.: Weakly supervised video emotion detection and prediction via cross-modal temporal erasing network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18888–18897 (2023)
Google Scholar
Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 5408–5414 (2019)
Google Scholar
Khan, Z., Fu, Y.: Exploiting BERT for multimodal target sentiment classification through input space translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3034–3042 (2021)
Google Scholar
Yang, L., Na, J.-C., Yu, J.: Cross-modal multi task transformer for end-to-end multimodal aspect based sentiment analysis. Inf. Process. Manag. 59(5) (2022)
Google Scholar
Jia, L., Ma, T., Rong, H., Al-Nabhan, N.: Affective region recognition and fusion network for target-level multimodal sentiment classification. IEEE Trans. Emerg. Top. Comput. 0(1) (2023)
Google Scholar
He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Chen, C., Han, D., Chang, C.C.: CAAN: context-aware attention network for visual question answering. Pattern Recogn. 132 (2022)
Google Scholar
Liu, Y., Liu, H., Wang, H., Meng, F., Liu, M.: BCAN: bidirectional correct attention network for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Fan, F., Feng, Y., Zhao, D.: Multi-grained attention network for aspect-level sentiment classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3433–3442 (2018)
Google Scholar
Hazarika, D., Poria, S., Zadeh, A., Cambria, E., Morency, L.P., Zimmermann, R.: Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, p. 2122 (2018)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62366010), The Key lab of Trusted software (k202060) and CCF-Zhipu AI Large Model Fund.

Author information

Authors and Affiliations

Key Laboratory of Guangxi Trusted Software, College of Computer and Information Security, Guilin University of Electronic Technology, Guangxi, 541004, China
Guoyong Cai, Shunjie Wang & Guangrui Lv

Authors

Guoyong Cai
View author publications
You can also search for this author in PubMed Google Scholar
Shunjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangrui Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shunjie Wang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, G., Wang, S., Lv, G. (2024). Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_30

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_30
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis