skip to main content
10.1145/3581783.3612517acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-label Emotion Analysis in Conversation via Multimodal Knowledge Distillation

Published: 27 October 2023 Publication History

Abstract

Evaluating speaker emotion in conversations is crucial for various applications requiring human-computer interaction. However, co-occurrences of multiple emotional states (e.g. 'anger' and 'frustration' may occur together or one may influence the occurrence of the other) and their dynamic evolution may vary dramatically due to the speaker's internal (e.g., influence of their personalized socio-cultural-educational and demographic backgrounds) and external contexts. Thus far, the previous focus has been on evaluating only the dominant emotion observed in a speaker at a given time, which is susceptible to producing misleading classification decisions for difficult multi-labels during testing. In this work, we present Self-supervised Multi- Label Peer Collaborative Distillation (SeMuL-PCD) Learning via an efficient Multimodal Transformer Network, in which complementary feedback from multiple mode-specific peer networks (e.g.transcript, audio, visual) are distilled into a single mode-ensembled fusion network for estimating multiple emotions simultaneously. The proposed Multimodal Distillation Loss calibrates the fusion network by minimizing the Kullback-Leibler divergence with the peer networks. Additionally, each peer network is conditioned using a self-supervised contrastive objective to improve the generalization across diverse socio-demographic speaker backgrounds. By enabling peer collaborative learning that allows each network to independently learn their mode-specific discriminative patterns,SeMUL-PCD is effective across different conversation environments. In particular, the model not only outperforms the current state-of-the-art models on several large-scale public datasets (e.g., MOSEI, EmoReact and ElderReact), but with around 17% improved weighted F1-score in the cross-dataset experimental settings. The model also demonstrates an impressive generalization ability across age and demography-diverse populations.

References

[1]
Francisca Adoma Acheampong, Henry Nunoo-Mensah, and Wenyu Chen. 2021. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artificial Intelligence Review (2021), 1--41.
[2]
Harsh Agarwal, Keshav Bansal, Abhinav Joshi, and Ashutosh Modi. 2021. Shapes of emotions: Multimodal emotion recognition in conversations via emotion shifts. arXiv preprint arXiv:2112.01938 (2021).
[3]
Hassan Akbari, Liangzhe Yuan, Rui Qian,Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. arXiv preprint arXiv:2104.11178 (2021).
[4]
Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider, Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, and Andrew Zisserman. 2020. Self-supervised multimodal versatile networks. Advances in Neural Information Processing Systems 33 (2020), 25--37.
[5]
Gerard F Anderson and Peter Sotir Hussey. 2000. Population Aging: A Comparison Among Industrialized Countries: Populations around the world are growing older, but the trends are not cause for despair. Health affairs 19, 3 (2000), 191--203.
[6]
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 6836--6846.
[7]
Balaji Arumugam, Sreyasee Das Bhattacharjee, and Junsong Yuan. 2022. Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1210--1214.
[8]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[9]
Tadas Baltru?aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443.
[10]
Pablo Barros and Alessandra Sciutti. 2022. Across the Universe: Biasing Facial Representations Toward Non-Universal Emotions With the Face-STN. IEEE Access 10 (2022), 103932--103947.
[11]
Pablo Barros and Alessandra Sciutti. 2022. Ciao! a contrastive adaptation mechanism for non-universal facial expression recognition. In 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--8.
[12]
Charles E. Hughes Louis-Philippe Morency Behnaz Nojavanasghari, Tadas Baltrusaitis. 2016. EmoReact: A Multimodal Approach and Dataset for Recognizing Emotional Responses in Children. International Conference on Multimodal Interfaces( ICMI) (2016).
[13]
Daniel E Berlyne. 1960. Conflict, arousal, and curiosity. (1960).
[14]
Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4. Springer.
[15]
Junyan Cheng, Iordanis Fostiropoulos, Barry Boehm, and Mohammad Soleymani. 2021. Multimodal phased transformer for sentiment analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2447--2458.
[16]
Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, and Naoyuki Onoe. 2022. M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4652--4661.
[17]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding., 4171--4186 pages. https://doi.org/10.18653/v1/n19--1423
[18]
Sarah E Donohue, Lawrence G Appelbaum, Christina J Park, Kenneth C Roberts, and Marty GWoldorff. 2013. Cross-modal stimulus conflict: the behavioral effects of stimulus input timing in a visual-auditory Stroop task. PloS one 8, 4 (2013), e62802.
[19]
Yingruo Fan, Jacqueline CK Lam, and Victor OK Li. 2021. Demographic effects on facial emotion expression: an interdisciplinary investigation of the facial action units of happiness. Scientific reports 11, 1 (2021), 1--11.
[20]
Mara Fölster, Ursula Hess, and Katja Werheid. 2014. Facial age affects emotional expression decoding. Frontiers in psychology 5 (2014), 30.
[21]
Maxi Freudenberg, Reginald B Adams Jr, Robert E Kleck, and Ursula Hess. 2015. Through a glass darkly: facial wrinkles affect our processing of emotion in the elderly. Frontiers in psychology 6 (2015), 1476.
[22]
Deepanway Ghosal, Navonil Majumder, Alexander Gelbukh, Rada Mihalcea, and Soujanya Poria. 2020. COSMIC: COmmonSense knowledge for eMotion Identification in Conversations. arXiv:2010.02795 [cs.CL]
[23]
Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, and Alexander Gelbukh. 2019. Dialoguegcn: A graph convolutional neural network for emotion recognition in conversation. arXiv preprint arXiv:1908.11540 (2019).
[24]
Yuan Gong, Yu-An Chung, and James Glass. 2021. Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778 (2021).
[25]
Sarah A Grainger, Julie D Henry, Louise H Phillips, Eric J Vanman, and Roy Allen. 2017. Age deficits in facial affect recognition: The influence of dynamic cues. Journals of Gerontology Series B: Psychological Sciences and Social Sciences 72, 4 (2017), 622--632.
[26]
Sabrina N Grondhuis, Angela Jimmy, Carolina Teague, and Nicolas M Brunet. 2021. Having difficulties reading the facial expression of older individuals? Blame it on the facial muscles, not the wrinkles. Frontiers in Psychology 12 (2021), 620768.
[27]
Wei Han, Hui Chen, and Soujanya Poria. 2021. Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9180--9192. https://doi.org/10.18653/v1/2021. emnlp-main.723
[28]
Wei Han, Hui Chen, and Soujanya Poria. 2021. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv preprint arXiv:2109.00412 (2021).
[29]
Susan Harter and Nancy Rumbaugh Whitesell. 1989. Developmental changes in children's understanding of single, multiple, and blended emotion concepts. (1989).
[30]
Verena Haunschmid, Ethan Manilow, and Gerhard Widmer. 2020. audioLIME: Listenable Explanations Using Source Separation. CoRR abs/2008.00582 (2020). arXiv:2008.00582 https://arxiv.org/abs/2008.00582
[31]
Devamanyu Hazarika, Soujanya Poria, Rada Mihalcea, Erik Cambria, and Roger Zimmermann. 2018. Icon: Interactive conversational memory network for multimodal emotion detection. In Proceedings of the 2018 conference on empirical methods in natural language processing. 2594--2604.
[32]
Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 1122--1131.
[33]
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
[34]
K.J.T. Hetterscheid. 2020. Detecting agitated speech: A neural network approach. http://essay.utwente.nl/82014/
[35]
Dou Hu, Lingwei Wei, and Xiaoyong Huai. 2021. Dialoguecrn: Contextual reasoning networks for emotion recognition in conversations. arXiv preprint arXiv:2106.01978 (2021).
[36]
Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, and Yongbin Li. 2022. UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition. arXiv preprint arXiv:2211.11256 (2022).
[37]
Isabelle Hupont, Songül Tolan, Pedro Frau, Lorenzo Porcaro, and Emilia Gómez. 2023. Measuring and fostering diversity in Affective Computing research. IEEE Transactions on Affective Computing (2023).
[38]
Sk Rahatul Jannat and Shaun Canavan. 2021. Expression Recognition Across Age. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). IEEE, 1--5.
[39]
Sk Rahatul Jannat and Shaun Canavan. 2021. Expression Recognition Across Age. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021). 1--5. https://doi.org/10.1109/FG52635.2021.9667062
[40]
Sébastien Lallé, Rohit Murali, Cristina Conati, and Roger Azevedo. 2021. Predicting co-occurring emotions from eye-tracking and interaction data in MetaTutor. In Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14-18, 2021, Proceedings, Part I. Springer, 241--254.
[41]
Jeff T Larsen and A Peter McGraw. 2014. The case for mixed emotions. Social and Personality Psychology Compass 8, 6 (2014), 263--274.
[42]
Michael Leben. 2012. Email Classification with Contextual Information. Ph.D. Dissertation. Hasso-Plattner-Institute.
[43]
Zaijing Li, Fengxiao Tang, Ming Zhao, and Yusen Zhu. 2022. EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition. arXiv preprint arXiv:2203.13504 (2022).
[44]
Zheng Lian, Bin Liu, and Jianhua Tao. 2021. CTNet: Conversational transformer network for emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 985--1000.
[45]
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. arXiv preprint arXiv:1806.00064 (2018).
[46]
Kaixin Ma, Xinyu Wang, Xinru Yang, Mingtong Zhang, Jeffrey M Girard, and Louis-Philippe Morency. 2019. ElderReact: a multimodal dataset for recognizing emotional response in aging adults. In 2019 international conference on multimodal interaction. 349--357.
[47]
Carol Magai, Nathan S Consedine, Yulia S Krivoshekova, Elizabeth Kudadjie-Gyamfi, and Renee McPherson. 2006. Emotion experience and expression across the adult life span: insights from a multimodal assessment study. Psychology and aging 21, 2 (2006), 303.
[48]
Navonil Majumder, Devamanyu Hazarika, Alexander Gelbukh, Erik Cambria, and Soujanya Poria. 2018. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based systems 161 (2018), 124--133.
[49]
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, and Erik Cambria. 2019. DialogueRNN: An Attentive RNN for Emotion Detection in Conversations. arXiv:1811.00405 [cs.CL]
[50]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[51]
Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, and Mario Lucic. 2021. Revisiting the calibration of modern neural networks. Advances in Neural Information Processing Systems 34 (2021), 15682--15694.
[52]
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2020. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1359--1367.
[53]
Rohit Murali, Cristina Conati, and Roger Azevedo. 2023. Predicting Co-occurring Emotions in MetaTutor when Combining Eye-Tracking and Interaction Data from Separate User Studies. In LAK23: 13th International Learning Analytics and Knowledge Conference. 388--398.
[54]
Bhalaji Nagarajan and V Ramana Murthy Oruganti. 2019. Cross-domain transfer learning for complex emotion recognition. In 2019 IEEE Region 10 Symposium (TENSYMP). IEEE, 649--653.
[55]
Shini Girija Naveed Ahmed, Zaher Al Aghbari. 2023. A systematic survey on multimodal emotion recognition using learning algorithms. Intelligent Systems with Applications (2023).
[56]
Behnaz Nojavanasghari, Tadas Baltru?aitis, Charles E Hughes, and Louis-Philippe Morency. 2016. Emoreact: a multimodal approach and dataset for recognizing emotional responses in children. In Proceedings of the 18th acm international conference on multimodal interaction. 137--144.
[57]
Behnaz Nojavanasghari, Deepak Gopinath, Jayanth Koushik, Tadas Baltru?aitis, and Louis-Philippe Morency. 2016. Deep Multimodal Fusion for Persuasiveness Prediction. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (Tokyo, Japan) (ICMI '16). Association for Computing Machinery, New York, NY, USA, 284--288. https://doi.org/10.1145/2993148.2993176
[58]
Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 439--448.
[59]
Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2020. NIH Public Access, 2359.
[60]
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. CoRR abs/1602.04938 (2016). arXiv:1602.04938 http://arxiv.org/abs/1602.04938
[61]
Piao Shi, Min Hu, Fuji Ren, Xuefeng Shi, and Liangfeng Xu. 2022. Learning modality-fused representation based on transformer for emotion analysis. Journal of Electronic Imaging 31, 6 (2022), 063032.
[62]
Zhongkai Sun, Prathusha Sarma, William Sethares, and Yingyu Liang. 2020. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8992--8999.
[63]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.
[64]
Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176 (2018).
[65]
Geng Tu, Bin Liang, Dazhi Jiang, and Ruifeng Xu. 2022. Sentiment-Emotion-and Context-guided Knowledge Selection Framework for Emotion Recognition in Conversations. IEEE Transactions on Affective Computing (2022).
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[67]
Jingyao Wang, Luntian Mou, Lei Ma, Tiejun Huang, and Wen Gao. 2023. AMSA: Adaptive Multimodal Learning for Sentiment Analysis. ACM Transactions on Multimedia Computing, Communications and Applications 19, 3s (2023), 1--21.
[68]
KX Wang, QL Zhang, and SY Liao. 2014. A database of elderly emotional speech. In Proc. Int. Symp. Signal Process. Biomed. Eng Informat. 549--553.
[69]
KunxiaWang, ZongBao Zhu, ShidongWang, Xiao Sun, and Lian Li. 2016. A database for emotional interactions of the elderly. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). IEEE, 1--6.
[70]
Chung-Hsien Wu and Wei-Bin Liang. 2011. Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels. IEEE Transactions on Affective Computing 2, 1 (2011), 10--21. https://doi.org/10.1109/T-AFFC.2010.16
[71]
Lin Xiao, Xin Huang, Boli Chen, and Liping Jing. 2019. Label-specific document representation for multi-label text classification. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). 466--475.
[72]
Tian Xu, Jennifer White, Sinan Kalkan, and Hatice Gunes. 2020. Investigating bias and fairness in facial expression recognition. In Computer Vision-ECCV 2020 Workshops: Glasgow, UK, August 23-28, 2020, Proceedings, Part VI 16. Springer, 506--523.
[73]
Pengcheng Yang, Fuli Luo, Shuming Ma, Junyang Lin, and Xu Sun. 2019. A Deep Reinforced Sequence-to-Set Model for Multi-Label Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 5252--5258. https://doi.org/10.18653/v1/P19--1518
[74]
Wenmeng Yu, Hua Xu, Ziqi Yuan, and Jiele Wu. 2021. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 10790--10797.
[75]
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259 (2016).
[76]
AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis- Philippe Morency. 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2236--2246.
[77]
Dong Zhang, Xincheng Ju, Junhui Li, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2020. Multi-modal multi-label emotion detection with modality and label dependence. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 3584--3593.
[78]
Dong Zhang, Xincheng Ju, Junhui Li, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2020. Multi-modal Multi-label Emotion Detection with Modality and Label Dependence. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 3584--3593. https://doi.org/10.18653/v1/2020.emnlp-main.291
[79]
Dong Zhang, Xincheng Ju, Wei Zhang, Junhui Li, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2021. Multi-modal multi-label emotion recognition with heterogeneous hierarchical message passing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14338--14346.
[80]
Dong Zhang, Xincheng Ju, Wei Zhang, Junhui Li, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2021. Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing. Proceedings of the AAAI Conference on Artificial Intelligence 35, 16 (May 2021), 14338--14346. https://doi.org/10.1609/aaai.v35i16.17686

Cited By

View all
  • (2024)A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681638(622-631)Online publication date: 28-Oct-2024
  • (2024)Smile: Spiking Multi-Modal Interactive Label-Guided Enhancement Network for Emotion Recognition2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688152(1-6)Online publication date: 15-Jul-2024
  • (2024)Intermediate Layer Attention Mechanism for Multimodal Fusion in Personality and Affect ComputingIEEE Access10.1109/ACCESS.2024.344237712(112776-112793)Online publication date: 2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaborative learning
  2. emotion analysis
  3. knowledge distillation
  4. multi-label classification
  5. transformer

Qualifiers

  • Research-article

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)367
  • Downloads (Last 6 weeks)27
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681638(622-631)Online publication date: 28-Oct-2024
  • (2024)Smile: Spiking Multi-Modal Interactive Label-Guided Enhancement Network for Emotion Recognition2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688152(1-6)Online publication date: 15-Jul-2024
  • (2024)Intermediate Layer Attention Mechanism for Multimodal Fusion in Personality and Affect ComputingIEEE Access10.1109/ACCESS.2024.344237712(112776-112793)Online publication date: 2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media