SmartRAN: Smart Routing Attention Network for multimodal sentiment analysis

Guo, Xueyu; Tian, Shengwei; Yu, Long; He, Xiaoyu

doi:10.1007/s10489-024-05839-7

SmartRAN: Smart Routing Attention Network for multimodal sentiment analysis

Published: 28 September 2024

Volume 54, pages 12742–12763, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xueyu Guo¹,
Shengwei Tian ORCID: orcid.org/0000-0003-3525-5102¹,
Long Yu² &
…
Xiaoyu He³

425 Accesses
Explore all metrics

Abstract

Multimodal sentiment analysis has received widespread attention from the research community in recent years; it aims to use information from different modalities to predict sentiment polarity. However, the model architecture of most existing methods is fixed, and data can only flow along an established path, which leads to poor generalization of the model to different types of data. Furthermore, most methods explore only intra- or intermodal interactions and do not combine the two. In this paper, we propose the Smart Routing Attention Network (SmartRAN). SmartRAN can smartly select the data flow path on the basis of the smart routing attention module, effectively avoiding the disadvantages of poor adaptability and generalizability caused by a fixed model architecture. In addition, SmartRAN includes the learning process of both intra- and intermodal information, which can enhance the semantic consistency of comprehensive information and improve the learning ability of the model for complex relationships. Extensive experiments on two benchmark datasets, CMU-MOSI and CMU-MOSEI, prove that the proposed SmartRAN has superior performance to state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-step Attention and Multi-level Structure Network for Multimodal Sentiment Analysis

DCASAM: advancing aspect-based sentiment analysis through a deep context-aware sentiment analysis model

Article Open access 10 August 2024

Aspect category sentiment analysis based on pre-trained BiLSTM and syntax-aware graph attention network

Article Open access 27 January 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

All the datasets used in this research are benchmark data that are publicly available online.

Notes

References

Krishnan H, Elayidom MS, Santhanakrishnan T (2022) A comprehensive survey on sentiment analysis in twitter data. Int J Distributed Syst Technol 13(5):1–22
Article Google Scholar
Zeng Y, Li Z, Chen Z, Ma H (2023) Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network. Front Comp Sci 17(6):176340
Article Google Scholar
Yang B, Shao B, Wu L, Lin X (2022) Multimodal sentiment analysis with unidirectional modality translation. Neurocomputing 467:130–137. https://doi.org/10.1016/j.neucom.2021.09.041
Article Google Scholar
Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. In: Palmer M, Hwa R, Riedel S (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, pp. 1103–111. https://doi.org/10.18653/v1/D17-1115 . https://aclanthology.org/D17-1115
Yang J, Yu Y, Niu D, Guo W, Xu Y (2023) ConFEDE: Contrastive feature decomposition for multimodal sentiment analysis. In: Rogers A, Boyd-Graber J, Okazaki N (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, pp 7617–763 https://doi.org/10.18653/v1/2023.acl-long.421 . https://aclanthology.org/2023.acl-long.421
Yu W, Xu H, Yuan Z, Wu J (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proc AAAI Conf Artif Intell 35(12):10790–10797. https://doi.org/10.1609/aaai.v35i12.17289
Article Google Scholar
Tsai Y-HH, Bai S, Liang PP, Kolter JZ, Morency L-P, Salakhutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. In: Korhonen A, Traum D, Màrquez L (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 6558–656https://doi.org/10.18653/v1/P19-1656 . https://aclanthology.org/P19-1656
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30
Zhou Y, Ren T, Zhu C, Sun X, Liu J, Ding X, Xu M, Ji R (2021) Trar: Routing the attention spans in transformer for visual question answering. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp 2054–206https://doi.org/10.1109/ICCV48922.2021.00208
Xue Z, Marculescu R (2023) Dynamic multimodal fusion. In: Multi-Modal Learning and Applications Workshop (MULA). CVPR
Tian Y, Xu N, Zhang R, Mao W (2023) Dynamic routing transformer network for multimodal sarcasm detection. In: Rogers A, Boyd-Graber J, Okazaki N (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, pp 2468–2480.https://doi.org/10.18653/v1/2023.acl-long.139 . https://aclanthology.org/2023.acl-long.139
Wang D, Guo X, Tian Y, Liu J, He L, Luo X (2023) Tetfn: A text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recogn 136:109259. https://doi.org/10.1016/j.patcog.2022.109259
Article Google Scholar
Yu Y, Zhao M, Qi S-A, Sun F, Wang B, Guo W, Wang X, Yang L, Niu D (2023) ConKI: Contrastive knowledge injection for multimodal sentiment analysis. In: Rogers A, Boyd-Graber J, Okazaki N (eds.) Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, Toronto, Canada, pp 13610–13624.https://doi.org/10.18653/v1/2023.findings-acl.860 . https://aclanthology.org/2023.findings-acl.860
Kim K, Park S (2023) Aobert: All-modalities-in-one bert for multimodal sentiment analysis. Inform Fus 92:37–45. https://doi.org/10.1016/j.inffus.2022.11.022
Article Google Scholar
Zhu L, Zhu Z, Zhang C, Xu Y, Kong X (2023) Multimodal sentiment analysis based on fusion methods: A survey. Inform Fus 95:306–325. https://doi.org/10.1016/j.inffus.2023.02.028
Article Google Scholar
Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Bagher Zadeh A, Morency L-P (2018) Efficient low-rank multimodal fusion with modality-specific factors. In: Gurevych I, Miyao Y (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp 2247–225https://doi.org/10.18653/v1/P18-1209 . https://aclanthology.org/P18-1209
Xu J, Li Z, Huang F, Li C, Yu PS (2021) Social image sentiment analysis by exploiting multimodal content and heterogeneous relations. IEEE Trans Industr Inf 17(4):2974–2982. https://doi.org/10.1109/TII.2020.3005405
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186.https://doi.org/10.18653/v1/N19-1423 . https://aclanthology.org/N19-1423
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
Hazarika D, Zimmermann R, Poria S (2020) Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia. pp 1122–1131
Han W, Chen H, Poria S (2021) Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Moens M-F, Huang X, Specia L, Yih SW-t. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 9180–919https://doi.org/10.18653/v1/2021.emnlp-main.723 . https://aclanthology.org/2021.emnlp-main.723
Li Y, Wang Y, Cui Z (2023) Decoupled multimodal distilling for emotion recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 6631–6640
Guo X, Tian S, Yu L, He X, Wang Z (2024) Mtfr: An universal multimodal fusion method through modality transfer and fusion refinement. Eng Appl Artif Intell 135:108844. https://doi.org/10.1016/j.engappai.2024.108844
Article Google Scholar
Sun L, Lian Z, Liu B, Tao J (2024) Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans Affect Comput 15(1):309–325. https://doi.org/10.1109/TAFFC.2023.3274829
Article Google Scholar
Fu Y, Zhang Z, Yang R, Yao C (2024) Hybrid cross-modal interaction learning for multimodal sentiment analysis. Neurocomputing. 571:127201. https://doi.org/10.1016/j.neucom.2023.127201
Article Google Scholar
Han Y, Huang G, Song S, Yang L, Wang H, Wang Y (2022) Dynamic neural networks: A survey. IEEE Trans Pattern Anal Mach Intell 44(11):7436–7456. https://doi.org/10.1109/TPAMI.2021.3117837
Article Google Scholar
Qu L, Liu M, Wu J, Gao Z, Nie L (2021) Dynamic modality interaction modeling for image-text retrieval. In: Proceedings of the 44th International ACM SIGIR conference on research and development in information retrieval. pp 1104–1113
Cai S, Shu Y, Wang W (2021) Dynamic routing networks. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 3588–3597
Huang G, Chen D, Li T, Wu F, Maaten L, Weinberger K (2018) Multi-scale dense networks for resource efficient image classification. In: International conference on learning representations. https://openreview.net/forum?id=Hk2aImxAb
Wang X, Yu F, Dou Z-Y, Darrell T, Gonzalez JE (2018) Skipnet: Learning dynamic routing in convolutional networks. In: The European Conference on Computer Vision (ECCV)
Shazeer N, Fatahalian K, Mark WR, Mullapudi RT (2018) Hydranets: Specialized dynamic architectures for efficient inference. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. pp 8080–8089.https://doi.org/10.1109/CVPR.2018.00843
Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C (2017) Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA. pp 2755–276https://doi.org/10.1109/ICCV.2017.298 . https://doi.ieeecomputersociety.org/10.1109/ICCV.2017.298
Li Y, Song L, Chen Y, Li Z, Zhang X, Wang X, Sun J (2020) Learning dynamic routing for semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, pp 8550–855. https://doi.org/10.1109/CVPR42600.2020.00858 . https://doi.ieeecomputersociety.org/10.1109/CVPR42600.2020.00858
Yang L, Han Y, Chen X, Song S, Dai J, Huang G (2020) Resolution adaptive networks for efficient inference. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, pp 2366–237. https://doi.org/10.1109/CVPR42600.2020.00244. https://doi.ieeecomputersociety.org/10.1109/CVPR42600.2020.00244
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
Cho K, Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Moschitti A, Pang B, Daelemans W (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1724–1734. https://doi.org/10.3115/v1/D14-1179 . https://aclanthology.org/D14-1179
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
He R, Ravula A, Kanagal B, Ainslie J (2021) RealFormer: Transformer likes residual attention. In: Zong C, Xia F, Li W, Navigli R (eds.) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, pp 929–943. https://doi.org/10.18653/v1/2021.findings-acl.81 . https://aclanthology.org/2021.findings-acl.81
Hendrycks D, Gimpel K (2017) Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. https://openreview.net/forum?id=Bk0MRI5lg
Radford A, Narasimhan K, Salimans T, Sutskever I, et al (2018) Improving language understanding by generative pre-training
Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88. https://doi.org/10.1109/MIS.2016.94
Article Google Scholar
Bagher Zadeh A, Liang PP, Poria S, Cambria E, Morency L-P (2018) Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Gurevych I, Miyao Y (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp 2236–2246. https://doi.org/10.18653/v1/P18-1208 . https://aclanthology.org/P18-1208
Mao H, Yuan Z, Xu H, Yu W, Liu Y, Gao K (2022) M-SENA: An integrated platform for multimodal sentiment analysis. In: Basile V, Kozareva Z, Stajner S (eds.) Proceedings of the 60th annual meeting of the association for computational linguistics: system demonstrations. Association for Computational Linguistics, Dublin, Ireland, pp 204–213. https://doi.org/10.18653/v1/2022.acl-demo.20 . https://aclanthology.org/2022.acl-demo.20
Loshchilov I, Hutter F (2018) Fixing Weight Decay Regularization in Adam. https://openreview.net/forum?id=rk6qdGgCZ
Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

Download references

Acknowledgements

We thank the anonymous reviewers for their insightful comments. This study was partially supported by the Tianshan Talent Training Program in the Autonomous Region, China (grant number: 2023TSYCLJ0023); Natural Science Foundation of Xinjiang Uygur Autonomous Region (grant number: 2023D01C176); Xinjiang Uygur Autonomous Region Universities Fundamental Research Funds Scientific Research Project (grant number: XJEDU2022P018); Key Research and Development Projects in the Autonomous Region, China (grant number: 2023A03001, 2021B01002) and Key Program of the National Natural Science Foundation of China (grant number: U2003208).

Author information

Authors and Affiliations

College of Software, Xinjiang University, 830000, Urumqi, China
Xueyu Guo & Shengwei Tian
Network Center, Xinjiang University, 830046, Urumqi, China
Long Yu
College of Information Science and Engineering, Xinjiang University, 830000, Urumqi, China
Xiaoyu He

Authors

Xueyu Guo
View author publications
You can also search for this author inPubMed Google Scholar
Shengwei Tian
View author publications
You can also search for this author inPubMed Google Scholar
Long Yu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoyu He
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

$\bullet $ Xueyu Guo: Conceptualization, Methodology, Validation, Investigation, Writing - Original Draft, Writing - Review & Editing, Visualization. $\bullet $ Shengwei Tian: Validation, Writing - Review & Editing, Supervision, Funding acquisition.$\bullet $ Long Yu: Validation, Writing - Review & Editing, Supervision.$\bullet $ Xiaoyu He: Conceptualization, Validation, Writing - Review & Editing.

Corresponding author

Correspondence to Shengwei Tian.

Ethics declarations

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

The two multimodal sentiment analysis datasets used in this study are both open source datasets and do not involve any ethical issues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, X., Tian, S., Yu, L. et al. SmartRAN: Smart Routing Attention Network for multimodal sentiment analysis. Appl Intell 54, 12742–12763 (2024). https://doi.org/10.1007/s10489-024-05839-7

Download citation

Accepted: 31 August 2024
Published: 28 September 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s10489-024-05839-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SmartRAN: Smart Routing Attention Network for multimodal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Multi-step Attention and Multi-level Structure Network for Multimodal Sentiment Analysis

DCASAM: advancing aspect-based sentiment analysis through a deep context-aware sentiment analysis model

Aspect category sentiment analysis based on pre-trained BiLSTM and syntax-aware graph attention network

Explore related subjects

Data availability and access

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now