skip to main content
10.1145/3606039.3613106acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection

Published: 29 October 2023 Publication History

Abstract

In this paper, we present a solution for the Cross-Cultural Humor Detection (MuSe-Humor) sub-challenge, which is part of the Multimodal Sentiment Analys Challenge (MuSe) 2023. The MuSe-Humor task aims to detect humor from multimodal data, including video, audio, and text, in a cross-cultural context. The training data consists of German recordings, while the test data consists of English recordings. To tackle this sub-challenge, we propose a method called MMT-GD, which leverages a multimodal transformer model to effectively integrate the multimodal data. Additionally, we incorporate graph distillation to ensure that the fusion process captures discriminative features from each modality, avoiding excessive reliance on any single modality. Experimental results validate the effectiveness of our approach, achieving an Area Under the Curve (AUC) score of 0.8704 on the test set and securing the third position in the challenge.

References

[1]
Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Messner, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. In Proceedings of the 31st ACM International Conference on Multimedia (MM'23), October 29-November 2, 2023, Ottawa, Canada. Association for Computing Machinery, Ottawa, Canada. to appear.
[2]
Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, and Björn Schuller. 2017. Snore Sound Classification Using Image-based Deep Spectrum Features. In Proceedings INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association. ISCA, ISCA, Stockholm, Sweden, 3512--3516.
[3]
Issa Annamoradnejad and Gohar Zoghi. 2020. ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for Computational Humor.
[4]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, Vol. 33 (2020), 12449--12460.
[5]
Kim Binsted et al. 1995. Using humour to make natural language interfaces more friendly. In Proceedings of the ai, alife and entertainment workshop, intern. Joint conf. On artificial intelligence.
[6]
Arnie Cann, Amanda J Watson, and Elisabeth A Bridgewater. 2014. Assessing humor at work: The humor climate questionnaire. Humor, Vol. 27, 2 (2014), 307--323.
[7]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.
[8]
Chengxin Chen and Pengyuan Zhang. 2022. Integrating Cross-Modal Interactions via Latent Representation Shift for Multi-Modal Humor Detection. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe' 22). Association for Computing Machinery, New York, NY, USA, 23--28. https://doi.org/10.1145/3551876.3554805
[9]
Peng-Yu Chen and Von-Wun Soo. 2018. Humor Recognition Using Deep Learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 113--117. https://doi.org/10.18653/v1/N18--2018
[10]
Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023 a. The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation. In MuSe'23: Proceedings of the 4th Multimodal Sentiment Analysis Workshop and Challenge. Association for Computing Machinery. co-located with ACM Multimedia 2022, to appear.
[11]
Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, and Björn W. Schuller. 2023 b. Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. arxiv: 2209.14272 [cs.LG]
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.
[13]
Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, Vol. 7, 2 (2015), 190--202.
[14]
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia. Association for Computing Machinery, Firenze, Italy, 1459--1462.
[15]
Xiaochao Fan, Hongfei Lin, Liang Yang, Yufeng Diao, Chen Shen, Yonghe Chu, and Tongxuan Zhang. 2020. Phonetics and ambiguity comprehension gated attention network for humor recognition. Complexity, Vol. 2020 (2020), 1--9.
[16]
Panagiotis Gkorezis, Eugenia Petridou, and Panteleimon Xanthiakos. 2014. Leader positive humor and organizational cynicism: LMX as a mediator. Leadership & Organization Development Journal, Vol. 35 (2014), 305 -- 315.
[17]
Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, and Dong Hyun and Lee. 2013. Challenges in Representation Learning: A report on three machine learning contests. In Springer Berlin Heidelberg.
[18]
Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, and Soujanya Poria. 2021. Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis. In Proceedings of the 2021 International Conference on Multimodal Interaction (Montréal, QC, Canada) (ICMI '21). Association for Computing Machinery, New York, NY, USA, 6--15. https://doi.org/10.1145/3462244.3479919
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90
[20]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR, Vol. abs/1412.6980 (2014).
[21]
Anna Ladilova and Ulrike Schröder. 2022. Humor in intercultural interaction: A source for misunderstanding or a common ground builder? A multimodal analysis. Intercultural Pragmatics, Vol. 19, 1 (2022), 71--101.
[22]
Yong Li, Yuanzhi Wang, and Zhen Cui. 2023. Decoupled Multimodal Distilling for Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6631--6640.
[23]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection. arxiv: 1708.02002 [cs.CV]
[24]
R. Lotfian and C. Busso. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings. IEEE Transactions on Affective Computing, Vol. 10, 4 (October-December 2019), 471--483. https://doi.org/10.1109/TAFFC.2017.2736999
[25]
Rod A. Martin, Patricia Puhlik-Doris, Gwen Larsen, Jeanette Gray, and Kelly Weir. 2003. Individual differences in uses of humor and their relation to psychological well-being: Development of the Humor Styles Questionnaire. Journal of Research in Personality, Vol. 37, 1 (2003), 48--75. https://doi.org/10.1016/S0092--6566(02)00534--2
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.
[27]
Shraman Pramanick, Aniket Basu Roy, and Vishal M. Patel. 2021. Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021), 546--556.
[28]
Béatrice Priego-Valverde, Brigitte Bigi, Salvatore Attardo, Lucy Pickering, and Elisa Gironzetti. 2018. Is smiling during humor so obvious? A cross-cultural comparison of smiling behavior in humorous sequences in American English and French interactions. Intercultural Pragmatics, Vol. 15 (2018), 563 -- 591.
[29]
Zhibang Quan, Tao Sun, Mengli Su, and Jishu Wei. 2022. Multimodal Humor Detection Based on Cross-Modal Attention and Modal Maximum Correlation. In 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA). 1--2. https://doi.org/10.1109/DSAA54385.2022.10032426
[30]
Lu Ren, Bo Xu, Hongfei Lin, Jinhui Zhang, and Liang Yang. 2022. An Attention Network via Pronunciation, Lexicon and Syntax for Humor Recognition. Applied Intelligence, Vol. 52, 3 (feb 2022), 2690--2702. https://doi.org/10.1007/s10489-021-02580--3
[31]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal Transformer for Unaligned Multimodal Language Sequences. Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019 (2019), 6558--6569.
[32]
Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.
[33]
Haojie Xu, Weifeng Liu, Jiangwei Liu, Mingzheng Li, Yu Feng, Yasi Peng, Yunwei Shi, Xiao Sun, and Meng Wang. 2022. Hybrid Multimodal Fusion for Humor Detection. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe' 22). Association for Computing Machinery, New York, NY, USA, 15--21. https://doi.org/10.1145/3551876.3554802
[34]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, Vol. 23 (04 2016).
[35]
Zengqun Zhao, Qingshan Liu, and Shanmin Wang. 2021. Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Transactions on Image Processing, Vol. 30 (2021), 6544--6556. https://doi.org/10.1109/TIP.2021.3093397 io

Cited By

View all
  • (2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
  • (2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
  • (2024)DPP: A Dual-Phase Processing Method for Cross-Cultural Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689080(70-78)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation
      November 2023
      113 pages
      ISBN:9798400702709
      DOI:10.1145/3606039
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. humor detection
      2. multimodal fusion
      3. multimodal sentiment analysis
      4. transformer

      Qualifiers

      • Research-article

      Conference

      MM '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 14 of 17 submissions, 82%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)55
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
      • (2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
      • (2024)DPP: A Dual-Phase Processing Method for Cross-Cultural Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689080(70-78)Online publication date: 28-Oct-2024
      • (2024)Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00490(4866-4872)Online publication date: 17-Jun-2024
      • (2023)MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of AffectsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3610943(9723-9725)Online publication date: 26-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media