Skip to main content

Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

  • 999 Accesses

Abstract

Aspect-level multimodal sentiment analysis is a target oriented fine-grained sentiment analysis task aimed at determining the sentiment polarity of a given aspect of a sentence in conjunction with relevant multimodal data. Multimodal alignment and fusion remains a challenge for this task, and this paper proposes to solve this issue by considering the inter-modal local interactions. Therefore, a co-attention guided local-global feature fusion (CLGFF) method is proposed. The CLGFF method mines both aspect-guided global multimodal features and local fine-grained alignment between multimodalities, and then fuses them together for better exploitation of the global-local semantic correlation. A large number of experiments are carried out on two aspect-level multimodal sentiment datasets. A series of methods are compared from the experiments, and the results show that the proposed CLGFF method can better capture the local semantic correlation within the modality and the fine-grained consistency between different modalities, thereby improving the performance of aspect-level multimodal sentiment analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Truong, Q.T., Lauw, H.W.: VistaNet: visual aspect attention network for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 305–312 (2019)

    Google Scholar 

  2. Xu, N., Mao, W., Chen, G.: Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 371–378 (2019)

    Google Scholar 

  3. Yu, J., Jiang, J., Xia, R.: Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 429–439 (2019)

    Article  Google Scholar 

  4. Gu, D., Wang, J., Cai, S.: Targeted aspect-based multimodal sentiment analysis: an attention capsule extraction and multi-head fusion network. IEEE Access 9, 157329–157336 (2021)

    Article  Google Scholar 

  5. Xu, N., Mao, W., Chen, G.: A co-memory network for multimodal sentiment analysis. In: the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 929–932 (2018)

    Google Scholar 

  6. Nemati, S., Rohani, R., Basiri, M.E.: A hybrid latent space data fusion method for multimodal emotion recognition. IEEE Access 7, 172948–172964 (2019)

    Article  Google Scholar 

  7. Yu, Y., Lin, H., Meng, J.: Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms 9(2), 41 (2016)

    Article  MathSciNet  Google Scholar 

  8. Kumar, A., Srinivasan, K., Cheng, W.H.: Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf. Process. Manag. 57(1), 102141 (2020)

    Article  Google Scholar 

  9. Chen, F., Gao, Y., Cao, D.: Multimodal hypergraph learning for microblog sentiment prediction. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)

    Google Scholar 

  10. Xu, J., Huang, F., Zhang, X.: Sentiment analysis of social images via hierarchical deep fusion of content and links. Appl. Soft Comput. 80, 387–399 (2019)

    Article  Google Scholar 

  11. Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(4) (2018)

    Google Scholar 

  12. Dong, L., Wei, F., Tan, C.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 49–54 (2014)

    Google Scholar 

  13. Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, pp. 3298–3307 (2016)

    Google Scholar 

  14. Li, R., Chen, H., Feng, F., Ma, Z., Wang, X., Hovy, E.: Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 6319–6329 (2021)

    Google Scholar 

  15. Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 214–224 (2016)

    Google Scholar 

  16. Chen, P., Sun, Z., Bing, L.: Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 452–461 (2017)

    Google Scholar 

  17. Zhao, S., et al.: An end-to-end visual-audio attention network for emotion recognition in user-generated videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 303–311 (2020)

    Google Scholar 

  18. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)

    Google Scholar 

  19. Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10790–10797 (2021)

    Google Scholar 

  20. Zhang, Z., Yang, J.: Temporal sentiment localization: listen and look in untrimmed videos. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 199–208 (2022)

    Google Scholar 

  21. Zhang, Z., Wang, L., Yang, J.: Weakly supervised video emotion detection and prediction via cross-modal temporal erasing network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18888–18897 (2023)

    Google Scholar 

  22. Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 5408–5414 (2019)

    Google Scholar 

  23. Khan, Z., Fu, Y.: Exploiting BERT for multimodal target sentiment classification through input space translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3034–3042 (2021)

    Google Scholar 

  24. Yang, L., Na, J.-C., Yu, J.: Cross-modal multi task transformer for end-to-end multimodal aspect based sentiment analysis. Inf. Process. Manag. 59(5) (2022)

    Google Scholar 

  25. Jia, L., Ma, T., Rong, H., Al-Nabhan, N.: Affective region recognition and fusion network for target-level multimodal sentiment classification. IEEE Trans. Emerg. Top. Comput. 0(1) (2023)

    Google Scholar 

  26. He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  27. Chen, C., Han, D., Chang, C.C.: CAAN: context-aware attention network for visual question answering. Pattern Recogn. 132 (2022)

    Google Scholar 

  28. Liu, Y., Liu, H., Wang, H., Meng, F., Liu, M.: BCAN: bidirectional correct attention network for cross-modal retrieval. IEEE Trans. Neural Netw. Learn. Syst. (2023)

    Google Scholar 

  29. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  30. Fan, F., Feng, Y., Zhao, D.: Multi-grained attention network for aspect-level sentiment classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3433–3442 (2018)

    Google Scholar 

  31. Hazarika, D., Poria, S., Zadeh, A., Cambria, E., Morency, L.P., Zimmermann, R.: Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, p. 2122 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62366010), The Key lab of Trusted software (k202060) and CCF-Zhipu AI Large Model Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shunjie Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, G., Wang, S., Lv, G. (2024). Co-attention Guided Local-Global Feature Fusion for Aspect-Level Multimodal Sentiment Analysis. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8429-9_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8428-2

  • Online ISBN: 978-981-99-8429-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics