research-article

Label Distribution Adaptation for Multimodal Emotion Recognition with Multi-label Learning

Authors:
Hailun Lian

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0002-1355-9503
View Profile

,
Cheng Lu

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0002-1477-1020
View Profile

,
Sunan Li

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0003-1494-4873
View Profile

,
Yan Zhao

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0003-4577-7078
View Profile

,
Chuangao Tang

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0002-3653-136X
View Profile

,
Yuan Zong

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0002-0839-8792
View Profile

,
Wenming Zheng

Southeast University, Nanjing, China

Southeast University, Nanjing, China

0000-0002-7764-5179
View Profile

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective ComputingOctober 2023Pages 51–58https://doi.org/10.1145/3607865.3613183

Published:29 October 2023Publication History

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

Pages 51–58

ABSTRACT

In the task of multimodal emotion recognition with multi-label learning (MER-MULTI), leveraging the correlation between discrete and dimensional emotions is crucial for improving the model's performance. However, there may be a mismatch between the feature distributions of the training set and the testing set, which could result in the trained model's inability to adapt to the correlations between labels in the testing set. Therefore, a significant challenge in MER-MULTI is how to match the feature distributions of the training set and testing set samples. To tackle this issue, we propose a method called Label Distribution Adaptation for MER-MULTI. More specifically, by adapting the label distribution between the training set and testing set to remove training samples that do not match the features of the testing set. This can enhance the model's performance and generalization on testing data, enabling it to better capture the correlations between labels. Furthermore, to alleviate the difficulty of model training and inference, we design a novel loss function called Multi-label Emotion Joint Learning Loss (MEJL), which combines the correlations between discrete and dimensional emotions. Specifically, through contrastive learning, we transform the shared feature distribution of multiple labels into a space where discrete and dimensional emotions are consistent. This facilitates the model in learning the relationships between discrete and dimensional emotions. Finally, we have evaluated the proposed method, which has achieved second place in the MER-MULTI task of the MER 2023 Challenge.

References

Md Shad Akhtar, Dushyant Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 370--379.Google ScholarCross Ref
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems , Vol. 33 (2020), 12449--12460.Google Scholar
Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. 19--26.Google ScholarDigital Library
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al. 2022. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, Vol. 16, 6 (2022), 1505--1518.Google ScholarCross Ref
Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting pre-trained models for Chinese natural language processing. arXiv preprint arXiv:2004.13922 (2020).Google Scholar
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, et al. 2013. Challenges in representation learning: A report on three machine learning contests. In Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3--7, 2013. Proceedings, Part III 20. Springer, 117--124.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing , Vol. 29 (2021), 3451--3460.Google ScholarDigital Library
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in neural information processing systems , Vol. 33 (2020), 18661--18673.Google Scholar
DP Kingma. 2014. Adam: a method for stochastic optimization. In Int Conf Learn Represent.Google Scholar
Dimitrios Kollias, Attila Schulc, Elnar Hajiyev, and Stefanos Zafeiriou. 2020. Analysing affective behavior in the first abaw 2020 competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 637--643.Google ScholarDigital Library
Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, and Te-Won Lee. 2003. Emotion recognition by speech signals. In Eighth European conference on speech communication and technology.Google ScholarCross Ref
Shan Li, Weihong Deng, and JunPing Du. 2017. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2852--2861.Google ScholarCross Ref
Ya Li, Jianhua Tao, Björn Schuller, Shiguang Shan, Dongmei Jiang, and Jia Jia. 2018. Mec 2017: Multimodal emotion recognition challenge. In 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 1--5.Google ScholarCross Ref
Zheng Lian, Bin Liu, and Jianhua Tao. 2021a. CTNet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing , Vol. 29 (2021), 985--1000.Google ScholarDigital Library
Zheng Lian, Bin Liu, and Jianhua Tao. 2021b. DECN: Dialogical emotion correction network for conversational emotion recognition. Neurocomputing , Vol. 454 (2021), 483--495.Google ScholarCross Ref
Zheng Lian, Haiyang Sun, Licai Sun, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, et al. 2023. MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning. arXiv preprint arXiv:2304.08981 (2023).Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Suhaila Najim Mohammed and Alia Karim Abdul Hassan. 2020. A survey on emotion recognition for human robot interaction. Journal of computing and information technology, Vol. 28, 2 (2020), 125--146.Google Scholar
Donn Morrison, Ruili Wang, and Liyanage C De Silva. 2007. Ensemble methods for spoken emotion recognition in call-centres. Speech communication, Vol. 49, 2 (2007), 98--112.Google Scholar
Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli. 2019. wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 (2019).Google Scholar
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research , Vol. 15, 1 (2014), 1929--1958.Google Scholar
Chuangao Tang, Wenming Zheng, Yuan Zong, Nana Qiu, Cheng Lu, Xilei Zhang, Xiaoyan Ke, and Cuntai Guan. 2020. Automatic identification of high-risk autism spectrum disorder: a feasibility study using video and audio data under the still-face paradigm. ieee transactions on neural systems and rehabilitation engineering, Vol. 28, 11 (2020), 2401--2410.Google Scholar
Siqing Wu, Tiago H Falk, and Wai-Yip Chan. 2011. Automatic speech emotion recognition using modulation spectral features. Speech communication, Vol. 53, 5 (2011), 768--785.Google Scholar
Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, and Xin Lei. 2021. Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. arXiv preprint arXiv:2102.01547 (2021).Google Scholar
Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2007. A survey of affect recognition methods: audio, visual and spontaneous expressions. In Proceedings of the 9th international conference on Multimodal interfaces. 126--133.Google ScholarDigital Library
Jianhua Zhang, Zhong Yin, Peng Chen, and Stefano Nichele. 2020. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion , Vol. 59 (2020), 103--126.Google ScholarCross Ref
Zengqun Zhao, Qingshan Liu, and Shanmin Wang. 2021. Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Transactions on Image Processing , Vol. 30 (2021), 6544--6556. ioGoogle ScholarDigital Library

Index Terms

Label Distribution Adaptation for Multimodal Emotion Recognition with Multi-label Learning
1. Computing methodologies
  1. Artificial intelligence
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More
Multi-label learning by exploiting label dependency
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

In multi-label learning, each training example is associated with a set of labels and the task is to predict the proper label set for the unseen example. Due to the tremendous (exponential) number of possible label sets, the task of learning from multi-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing
October 2023
88 pages
ISBN:9798400702884
DOI:10.1145/3607865
Program Chairs:
Shreya Ghosh
Curtin University, Australia
,
Abhinav Dhall
IIT Ropar, India
,
Dimitrios Kollias
Queen Mary University of London, UK
,
Roland Goecke
University of Canberra, Australia
,
Tom Gedeon
Curtin University, Australia
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contrastive learning
label distribution adaptation
multi-label emotion joint learning loss
multi-label learning
multimodal emotion recognition
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 92
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Label Distribution Adaptation for Multimodal Emotion Recognition with Multi-label Learning

MRAC '23: Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training

Semi-supervised multi-label classification using incomplete label information

Multi-label learning by exploiting label dependency