skip to main content
10.1145/3607865.3613179acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

An Improved Method for Enhancing Robustness of Multimodal Sentiment Classification Models via Utilizing Modality Latent Information

Published:29 October 2023Publication History

ABSTRACT

Multi-modal emotion analysis has become an active research field . However, in real-world scenarios, it is often necessary to analyze and recognize emotion data with noise. Integrating information from different modalities effectively to enhance the overall robustness of the model remains a challenge. To address this, we propose an improved approach that leverages modality latent information to enhance cross-modal interaction and improve the robustness of multi-modal emotion classification models. Specifically, we apply a multi-period-based preprocessing technique to the audio modality data. Additionally, we introduce a random modality noise injection strategy to augment the training data and enhance generalization capabilities. Finally, we employ a composite fusion method to integrate information features from different modalities, effectively promoting cross-modal information interaction and enhancing the overall robustness of the model. We evaluate our proposed method in the MER-NOISE sub-challenge of MER2023. Experimental results demonstrate that our improved multi-modal emotion classification model achieves a weighted F1 score of 69.66% and an MSE score of 0.92 on the MER-NOISE test set, with an overall score of 46.69%, representing a 5.69% improvement over the baseline. These results prove the effectiveness of our proposed approach in further enhancing the robustness of the model.

References

  1. Zheng Lian, Haiyang Sun, Licai Sun, Jinming Zhao, Ye Liu, B. Liu, Jiangyan Yi, M. Wang, E. Cambria, Guoying Zhao, Björn Schuller, and Jianhua Tao. Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning. ArXiv, abs/2304.08981, 2023.Google ScholarGoogle Scholar
  2. Zheng Lian, Lang Chen, Licai Sun, B. Liu, and Jianhua Tao. Gcnet: Graph completion network for incomplete multimodal learning in conversation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:8419--8432, 2022.Google ScholarGoogle Scholar
  3. Licai Sun, Zheng Lian, Bin Liu, and Jianhua Tao. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Transactions on Affective Computing, PP.Google ScholarGoogle Scholar
  4. Ziqi Yuan, Wei Li, Hua Xu, and Wenmeng Yu. Transformer-based feature reconstruction network for robust multimodal sentiment analysis. 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BjörnW. Schuller. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM, 61(5):90--99, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Baijun Xie, Mariia Sidulova, and Chung Hyuk Park. Robust multimodal emotion recognition from conversation with transformer-based crossmodality fusion. Sensors, 21(14):4913, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  7. Mingli Song, Mingyu You, Na Li, and Chun Chen. A robust multimodal approach for emotion recognition. Neurocomputing, 71(10):1913--1920, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tadas Baltrusaitis, Peter Robinson, and Louis Philippe Morency. Openface: An open source facial behavior analysis toolkit. In IEEE Winter Conference on Applications of Computer Vision.Google ScholarGoogle Scholar
  9. Zhuoyuan Yao, DiWu, XiongWang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, and Xin Lei. Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. 2021.Google ScholarGoogle Scholar
  10. Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  11. Chris Chatfield. The analysis of time series: An introduction. Biometrics, 52(3):1162, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  12. Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv e-prints, 2022.Google ScholarGoogle Scholar
  13. Haixu Wu, Teng Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. ArXiv, abs/2210.02186, 2022.Google ScholarGoogle Scholar
  14. Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770--778, 2015.Google ScholarGoogle Scholar
  15. Z. Zhao, Q. Liu, and S.Wang. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing, 30:6544--6556, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.Google ScholarGoogle Scholar
  17. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.Google ScholarGoogle Scholar
  18. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. Revisiting pre-trained models for chinese natural language processing. ArXiv, abs/2004.13922, 2020.Google ScholarGoogle Scholar
  19. Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdel-rahman Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451--3460, 2021.Google ScholarGoogle Scholar
  20. Zheng Lian, Bin Liu, and Jianhua Tao. Decn: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing, 454:483--495, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  21. Z. Lian, B. Liu, and J. Tao. Ctnet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:985--1000, 2021.Google ScholarGoogle Scholar
  22. D. Kollias, A. Schulc, E. Hajiyev, and S. Zafeiriou. Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pages 637--643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. Computer Science, 2014.Google ScholarGoogle Scholar
  24. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929--1958, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Improved Method for Enhancing Robustness of Multimodal Sentiment Classification Models via Utilizing Modality Latent Information

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing
        October 2023
        88 pages
        ISBN:9798400702884
        DOI:10.1145/3607865

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)77
        • Downloads (Last 6 weeks)7

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader