research-article

An Improved Method for Enhancing Robustness of Multimodal Sentiment Classification Models via Utilizing Modality Latent Information

Authors:
Hanxu Ai

Guangxi Normal University, Guilin, China

Guangxi Normal University, Guilin, China

0009-0005-3764-2765
View Profile

,
Xiaomei Tao

Guangxi Normal University, Guilin, China

Guangxi Normal University, Guilin, China

0000-0001-7850-9518
View Profile

,
Yuan Zhang

Guangxi Normal University, Guilin, China

Guangxi Normal University, Guilin, China

0009-0004-5413-487X
View Profile

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective ComputingOctober 2023Pages 5–11https://doi.org/10.1145/3607865.3613179

Published:29 October 2023Publication History

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

Pages 5–11

ABSTRACT

Multi-modal emotion analysis has become an active research field . However, in real-world scenarios, it is often necessary to analyze and recognize emotion data with noise. Integrating information from different modalities effectively to enhance the overall robustness of the model remains a challenge. To address this, we propose an improved approach that leverages modality latent information to enhance cross-modal interaction and improve the robustness of multi-modal emotion classification models. Specifically, we apply a multi-period-based preprocessing technique to the audio modality data. Additionally, we introduce a random modality noise injection strategy to augment the training data and enhance generalization capabilities. Finally, we employ a composite fusion method to integrate information features from different modalities, effectively promoting cross-modal information interaction and enhancing the overall robustness of the model. We evaluate our proposed method in the MER-NOISE sub-challenge of MER2023. Experimental results demonstrate that our improved multi-modal emotion classification model achieves a weighted F1 score of 69.66% and an MSE score of 0.92 on the MER-NOISE test set, with an overall score of 46.69%, representing a 5.69% improvement over the baseline. These results prove the effectiveness of our proposed approach in further enhancing the robustness of the model.

References

Zheng Lian, Haiyang Sun, Licai Sun, Jinming Zhao, Ye Liu, B. Liu, Jiangyan Yi, M. Wang, E. Cambria, Guoying Zhao, Björn Schuller, and Jianhua Tao. Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning. ArXiv, abs/2304.08981, 2023.Google Scholar
Zheng Lian, Lang Chen, Licai Sun, B. Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:8419--8432, 2022.Google Scholar
Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Transactions on Affective Computing, PP.Google Scholar
Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature reconstruction network for robust multimodal sentiment analysis. 2021.Google ScholarDigital Library
BjörnW. Schuller. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM, 61(5):90--99, 2018.Google ScholarDigital Library
Baijun Xie, Mariia Sidulova, and Chung Hyuk Park. Robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion. Sensors, 21(14):4913, 2021.Google ScholarCross Ref
Mingli Song, Mingyu You, Na Li, and Chun Chen. A robust multimodal approach for emotion recognition. Neurocomputing, 71(10):1913--1920, 2008.Google ScholarDigital Library
Tadas Baltrusaitis, Peter Robinson, and Louis Philippe Morency. Openface: An open source facial behavior analysis toolkit. In IEEE Winter Conference on Applications of Computer Vision.Google Scholar
Zhuoyuan Yao, DiWu, XiongWang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, and Xin Lei. Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. 2021.Google Scholar
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Neural Information Processing Systems.Google Scholar
Chris Chatfield. The analysis of time series: An introduction. Biometrics, 52(3):1162, 1996.Google ScholarCross Ref
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv e-prints, 2022.Google Scholar
Haixu Wu, Teng Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. ArXiv, abs/2210.02186, 2022.Google Scholar
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770--778, 2015.Google Scholar
Z. Zhao, Q. Liu, and S.Wang. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing, 30:6544--6556, 2021.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.Google Scholar
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. Revisiting pre-trained models for chinese natural language processing. ArXiv, abs/2004.13922, 2020.Google Scholar
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdel-rahman Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451--3460, 2021.Google Scholar
Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.Google ScholarCross Ref
Z. Lian, B. Liu, and J. Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.Google Scholar
D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou. Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 637--643.Google ScholarDigital Library
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. Computer Science, 2014.Google Scholar
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929--1958, 2014.Google ScholarDigital Library

Index Terms

An Improved Method for Enhancing Robustness of Multimodal Sentiment Classification Models via Utilizing Modality Latent Information
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing

Recommendations

Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

In this paper, we present the solutions to the MER-MULTI and MER-NOISE sub-challenges of the Multimodal Emotion Recognition Challenge (MER 2023). For the tasks MER-MULTI and MER-NOISE, participants are required to recognize both discrete and dimensional ...
Read More
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

The first Multimodal Emotion Recognition Challenge (MER 2023)1 was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both ...
Read More
Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Recently multi-modal recommender systems have been widely applied in real scenarios such as e-commerce businesses. Existing multi-modal recommendation methods exploit the multi-modal content of items as auxiliary information and fuse them to boost ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing
October 2023
88 pages
ISBN:9798400702884
DOI:10.1145/3607865
Program Chairs:
Shreya Ghosh
Curtin University, Australia
,
Abhinav Dhall
IIT Ropar, India
,
Dimitrios Kollias
Queen Mary University of London, UK
,
Roland Goecke
University of Canberra, Australia
,
Tom Gedeon
Curtin University, Australia
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature fusion
modality robustness
multimodal emotion analysis
multimodal emotion recognition challenge (mer 2023)
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 77
  Total Downloads
- Downloads (Last 12 months)77
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Improved Method for Enhancing Robustness of Multimodal Sentiment Classification Models via Utilizing Modality Latent Information

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing