Abstract
Multimodal aspect-based sentiment analysis aims to predict the sentiment polarity of all aspect targets from text-image pairs. Most existing methods fail to extract fine-grained visual sentiment information, leading to alignment issues between the two modalities due to inconsistent granularity. In addition, the deep interaction between syntactic structure and semantic information is also ignored. In this paper, we propose an Aspect-aware Semantic Feature Enhancement Network (ASFEN) for multimodal aspect-based sentiment analysis to learn aspect-aware semantic and sentiment information in images and texts. Specifically, images are converted into textual information with fine-grained emotional cues. We construct dependency syntax trees and multi-layer syntax masks to fuse syntactic and semantic information through graph convolution. Extensive experiments on two multimodal Twitter datasets demonstrate the superiority of ASFEN over existing methods. The code is publicly available at https://github.com/lllppi/ASFEN.






Similar content being viewed by others
Data Availability
The open source address for the code and data is provided in the manuscript.
References
Yang J, Xiao Y, Du X (2024) Multi-grained fusion network with self-distillation for aspect-based multimodal sentiment analysis. Knowl Based Syst 293:111724. https://doi.org/10.1016/j.knosys.2024.111724
Zhou R, Guo W, Liu X, Yu S, Zhang Y, Yuan X (2023) Aom: Detecting aspect-oriented information for multimodal aspect-based sentiment analysis. arXiv preprint arXiv:2306.01004https://doi.org/10.48550/arXiv.2306.01004
Yang L, Na J-C, Yu J (2022) Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis. Inf Process Manag 59(5):103038. https://doi.org/10.1016/j.ipm.2022.103038
Wang Q, Xu H, Wen Z, Liang B, Yang M, Qin B, Xu R (2023) Image-to-text conversion and aspect-oriented filtration for multimodal aspect-based sentiment analysis. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2023.3333200
Li Y, Ding H, Lin Y, Feng X, Chang L (2024) Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis. Artif Intell Rev 57(4):1–26. https://doi.org/10.1007/s10462-023-10685-z
Khan Z, Fu Y (2021) Exploiting bert for multimodal target sentiment classification through input space translation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3034–3042 . https://doi.org/10.1145/3474085.3475692
Xiao L, Zhou E, Wu X, Yang S, Ma T, He L (2022) Adaptive multi-feature extraction graph convolutional networks for multimodal target sentiment analysis. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 . https://doi.org/10.1109/ICME52920.2022.9860020 . IEEE
Chen Z, Qian T (2020) Relation-aware collaborative learning for unified aspect-based sentiment analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3685–3694 . https://doi.org/10.18653/v1/2020.acl-main.340
Pang S, Xue Y, Yan Z, Huang W, Feng J (2021) Dynamic and multi-channel graph convolutional networks for aspect-based sentiment analysis. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2627–2636 . https://doi.org/10.18653/v1/2021.findings-acl.232
Zhang W, Li X, Deng Y, Bing L, Lam W (2022) A survey on aspect-based sentiment analysis: tasks, methods, and challenges. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2022.3230975
Zhang Z, Zhou Z, Wang Y (2022) Ssegcn: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4916–4925 . https://doi.org/10.18653/v1/2022.naacl-main.362
Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 . https://doi.org/10.18653/v1/D16-1058
Zhao F, Wu Z, Dai X (2020) Attention transfer network for aspect-level sentiment classification. arXiv preprint arXiv:2010.12156https://doi.org/10.48550/arXiv.2010.12156
Phan M.H, Ogunbona P.O. (2020) Modelling context and syntactical features for aspect-based sentiment analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3211–3220 . https://doi.org/10.18653/v1/2020.acl-main.293
Dai J, Yan H, Sun T, Liu P, Qiu X (2021) Does syntax matter? a strong baseline for aspect-based sentiment analysis with roberta. arXiv preprint arXiv:2104.04986https://doi.org/10.48550/arXiv.2104.04986
Sun K, Zhang, R, Mensah S, Mao Y, Liu X (2019) Aspect-level sentiment analysis via convolution over dependency tree. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5679–5688 . https://doi.org/10.18653/v1/D19-1569
Wang K, Shen W, Yang Y, Quan X, Wang R(2020) Relational graph attention network for aspect-based sentiment analysis. arXiv preprint arXiv:2004.12362https://doi.org/10.48550/arXiv.2004.12362
Liang S, Wei W, Mao X.-L, Wang F, He Z (2022) Bisyn-gat+: Bi-syntax aware graph attention network for aspect-based sentiment analysis. arXiv preprint arXiv:2204.03117https://doi.org/10.18653/v1/2022.findings-acl.144
Kumar A, Garg G (2019) Sentiment analysis of multimodal twitter data. Multimed Tools Appl 78:24103–24119. https://doi.org/10.1007/s11042-019-7390-1
Kumar A, Srinivasan K, Cheng W-H, Zomaya AY (2020) Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Inf Process Manag 57(1):102141. https://doi.org/10.1016/j.ipm.2019.102141
Kaur R, Kautish S (2022) Multimodal sentiment analysis: a survey and comparison. Res Anthol Implement Sentim Anal Across Multi Discip. https://doi.org/10.4018/978-1-6684-6303-1.ch098
Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A (2023) Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inf Fus 91:424–444. https://doi.org/10.1016/j.inffus.2022.09.025
Fu Z, Liu F, Xu Q, Qi J, Fu X, Zhou A., Li Z (2022) Nhfnet: A non-homogeneous fusion network for multimodal sentiment analysis. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 . https://doi.org/10.1109/ICME52920.2022.9859836 . IEEE
Firdaus M, Singh GV, Ekbal A, Bhattacharyya P (2023) Affect-gcn: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues. Multi Tools Appl. https://doi.org/10.1007/s11042-023-14885-1
Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowl Based Syst 167:26–37. https://doi.org/10.1016/j.knosys.2019.01.019
Yang L, Yu J, Zhang C, Na J.-C (2021) Fine-grained sentiment analysis of political tweets with entity-aware multimodal network. In: Diversity, Divergence, Dialogue: 16th International Conference, iConference 2021, Beijing, China, March 17–31, 2021, Proceedings, Part I 16, pp. 411–420 . https://doi.org/10.1007/978-3-030-71292-1_31 . Springer
Xu, N., Mao, W., Chen, G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 371–378 . https://doi.org/10.1609/aaai.v33i01.3301371
Zhang, D., Wei, S., Li, S., Wu, H., Zhu, Q., Zhou, G (2021) Multi-modal graph fusion for named entity recognition with targeted visual guidance. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14347–14355 . https://doi.org/10.1609/aaai.v35i16.17687
Xiao L, Wu X, Xu J, Li W, Jin C, He L (2024) Atlantis: aesthetic-oriented multiple granularities fusion network for joint multimodal aspect-based sentiment analysis. Inf Fus. https://doi.org/10.1016/j.inffus.2024.102304
Yu, J., Jiang, J (2019) Adapting bert for target-oriented multimodal sentiment classification. . https://doi.org/10.24963/ijcai.2019/751 . IJCAI
Wang, J., Liu, Z., Sheng, V., Song, Y., Qiu, C (2021) Saliencybert: Recurrent attention network for target-oriented multimodal sentiment classification. In: Pattern Recognition and Computer Vision: 4th Chinese Conference, PRCV 2021, Beijing, China, October 29–November 1, 2021, Proceedings, Part III 4, pp. 3–15 . https://doi.org/10.1007/978-3-030-88010-1_1 . Springer
Ling, Y., Yu, J., Xia, R (2022) Vision-language pre-training for multimodal aspect-based sentiment analysis. arXiv preprint arXiv:2204.07955https://doi.org/10.48550/arXiv.2204.07955
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 . https://doi.org/10.1007/978-3-030-58452-8_13 . Springer
Serengil, S.I., Ozpinar, A (2020) Lightface: A hybrid deep face recognition framework. In: 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1–5 . https://doi.org/10.1109/ASYU50717.2020.9259802 . IEEE
Serengil, S.I., Ozpinar, A (2021) Hyperextended lightface: A facial attribute analysis framework. In: 2021 International Conference on Engineering and Emerging Technologies (ICEET), pp. 1–4 . https://doi.org/10.1109/ICEET53442.2021.9659697 . IEEE
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 223–232 . https://doi.org/10.1145/2502081.2502282
Borth, D., Chen, T., Ji, R., Chang, S.-F (2013) Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 459–460 . https://doi.org/10.1145/2502081.2502268
Nguyen, D.Q., Vu, T., Nguyen, A.T (2020) Bertweet: A pre-trained language model for english tweets. arXiv preprint arXiv:2005.10200https://doi.org/10.48550/arXiv.2005.10200
Fan, F., Feng, Y., Zhao, D (2018) Multi-grained attention network for aspect-level sentiment classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3433–3442 . https://doi.org/10.18653/v1/D18-1380
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805https://doi.org/10.48550/arXiv.1810.04805
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692https://doi.org/10.48550/arXiv.1907.11692
Yu, J., Wang, J., Xia, R., Li, J (2022) Targeted multimodal sentiment classification based on coarse-to-fine grained image-target matching. In: Proc. of the Thirty-First Int. Joint Conf. on Artificial Intelligence, IJCAI 2022, pp. 4482–4488 . https://doi.org/10.24963/ijcai.2022/622
Zhao, F., Wu, Z., Long, S., Dai, X., Huang, S., Chen, J (2022) Learning from adjective-noun pairs: A knowledge-enhanced framework for target-oriented multimodal sentiment classification. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 6784–6794
Yu J, Chen K, Xia R (2022) Hierarchical interactive multimodal transformer for aspect-based multimodal sentiment analysis. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2022.3171091
Xiao L, Wu X, Yang S, Xu J, Zhou J, He L (2023) Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis. Inf Process Manag 60(6):103508. https://doi.org/10.1016/j.ipm.2023.103508
Yang J, Xu M, Xiao Y, Du X (2024) Amifn: aspect-guided multi-view interactions and fusion network for multimodal aspect-based sentiment analysis. Neurocomputing 573:127222. https://doi.org/10.1016/j.neucom.2023.127222
Acknowledgements
We sincerely thank the editors and reviewers for their hard work. We would also sincerely thank National Natural Science Foundation of China Research Project (No. 62076103), Guangdong basic and applied basic research project (No. 2021A1515011171), and Guangzhou basic research plan, basic and applied basic research project (No. 202102080282) for their support of this paper.
Author information
Authors and Affiliations
Contributions
BZ, LX and RZL wrote the main manuscript text. YY, RYL and HD reviewed the paper, participated in the seminar, and made suggestions for revisions. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zeng, B., Xie, L., Li, R. et al. Aspect-aware semantic feature enhanced networks for multimodal aspect-based sentiment analysis. J Supercomput 81, 64 (2025). https://doi.org/10.1007/s11227-024-06472-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06472-4