Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis

Zhang, Tonghui; Dong, Changfei; Su, Jinsong; Zhang, Haiying; Li, Yuzheng

doi:10.1007/978-3-031-17120-8_44

Tonghui Zhang¹¹,
Changfei Dong¹¹,
Jinsong Su¹¹,
Haiying Zhang¹¹ &
…
Yuzheng Li¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2540 Accesses

Abstract

Representation learning is a significant and challenging task in multimodal sentiment analysis (MSA). It aims to improve the performance of model by learning effective unimodal or multimodal representation. To obtain desired characteristics of representation, various constraints are proposed in previous works. However, these constraints are less concerned with the filtering of task-irrelevant information, which is highly correlated with robustness of representation. In this paper, we design a framework based on information bottleneck to filter noise information. By maximizing mutual information between pairwise unimodal representations and minimizing mutual information between unimodal representation and corresponding input, we can promote unimodal representation for including more task-relevant information and filtering out task-irrelevant information. Furthermore, attention bottleneck is embedded into the unimodal encoding process to realize the interaction between different modalities. Then, to improve the discrimination of multimodal representation, we introduce supervised contrastive learning as a constraint of multimodal representation. Last, we conduct extensive experiments on two public multimodal baseline datasets. The experimental results validate the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rahman, W., et al.: Integrating multimodal information in large pretrained transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2359–2369 (2020)
Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Article Google Scholar
Belghazi, M.I., et al.: Mutual information neural estimation. In: International Conference on Machine Learning, pp. 531–540. PMLR (2018)
Google Scholar
Chen, M., Wang, S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 163–171 (2017)
Google Scholar
Cheng, P., Hao, W., Dai, S., Liu, J., Gan, Z., Carin, L.: CLUB: a contrastive log-ratio upper bound of mutual information. In: Proceedings of the 37th International Conference on Machine Learning, pp. 1779–1788 (2020)
Google Scholar
Colombo, P., Chapuis, E., Labeau, M., Clavel, C.: Improving multimodal fusion via mutual dependency maximisation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 231–245 (2021)
Google Scholar
Degottex, G., Kane, J., Drugman, T., Raitio, T., Scherer, S.: COVAREP-a collaborative voice analysis repository for speech technologies. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 960–964. IEEE (2014)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Han, W., Chen, H., Gelbukh, A., Zadeh, A., Morency, L.P., Poria, S.: Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In: Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 6–15 (2021)
Google Scholar
Han, W., Chen, H., Poria, S.: Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 9180–9192 (2021)
Google Scholar
Hazarika, D., Zimmermann, R., Poria, S.: MISA: modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1122–1131 (2020)
Google Scholar
Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman, A., Carreira, J.: Perceiver: general perception with iterative attention. In: International Conference on Machine Learning, pp. 4651–4664. PMLR (2021)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)
Google Scholar
Mai, S., Hu, H., Xing, S.: Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 164–172 (2020)
Google Scholar
Mai, S., Zeng, Y., Zheng, S., Hu, H.: Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis. IEEE Trans. Affective Comput. (2022)
Google Scholar
McAllester, D., Stratos, K.: Formal limitations on the measurement of mutual information. In: International Conference on Artificial Intelligence and Statistics, pp. 875–884. PMLR (2020)
Google Scholar
Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., Sun, C.: Attention bottlenecks for multimodal fusion. Adv. Neural. Inf. Process. Syst. 34, 14200–14213 (2021)
Google Scholar
Oord, A.V.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Pham, H., Liang, P.P., Manzini, T., Morency, L.P., Póczos, B.: Found in translation: learning robust joint representations by cyclic translations between modalities. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6892–6899 (2019)
Google Scholar
Pham, H., Manzini, T., Liang, P.P., Póczos, B.: Seq2Seq2Sentiment: multimodal sequence to sequence models for sentiment analysis. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pp. 53–63 (2018)
Google Scholar
Poole, B., Ozair, S., van den Oord, A., Alemi, A., Tucker, G.: On variational bounds of mutual information. In: ICML (2019)
Google Scholar
Shankar, S.: Neural dependency coding inspired multimodal fusion. arXiv preprint arXiv:2110.00385 (2021)
Sun, Z., Sarma, P., Sethares, W., Liang, Y.: Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8992–8999 (2020)
Google Scholar
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Wang, Y., Shen, Y., Liu, Z., Liang, P.P., Zadeh, A., Morency, L.P.: Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7216–7223 (2019)
Google Scholar
Wang, Z., Wan, Z., Wan, X.: Transmodality: an end2end fusion method with transformer for multimodal sentiment analysis. In: Proceedings of The Web Conference 2020, pp. 2514–2520 (2020)
Google Scholar
Yu, W., Xu, H., Yuan, Z., Wu, J.: Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10790–10797 (2021)
Google Scholar
Yuan, J., Liberman, M.: Speaker identification on the scotus corpus. J. Acoust. Soc. Am. Impact Factor 123(5), 3878 (2008)
Article Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
Google Scholar
Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259 (2016)
Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.P.: Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2236–2246 (2018)
Google Scholar

Download references

Acknowledgement

This work is supported by Natural Science Foundation of Fujian Province of China (No. 2020J06001), and Youth Innovation Fund of Xiamen (No. 3502Z20206059). This work is also supported by project S202210384799, S202210384831 supported by XMU Training Program of Innovation and Entrepreneurship for Undergraduates.

Author information

Authors and Affiliations

School of Informatics Xiamen University, Fujian, China
Tonghui Zhang, Changfei Dong, Jinsong Su, Haiying Zhang & Yuzheng Li

Authors

Tonghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Changfei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jinsong Su
View author publications
You can also search for this author in PubMed Google Scholar
Haiying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzheng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiying Zhang .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Nanjing University, Nanjing, China
Shujian Huang
Soochow University, Suzhou, China
Yu Hong
Soochow University, Soochow, China
Xiabing Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Dong, C., Su, J., Zhang, H., Li, Y. (2022). Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-17120-8_44
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis