research-article

MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection

Authors:

Jiaen LiangAuthors Info & Claims

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

Pages 43 - 49

https://doi.org/10.1145/3606039.3613106

Published: 29 October 2023 Publication History

Abstract

In this paper, we present a solution for the Cross-Cultural Humor Detection (MuSe-Humor) sub-challenge, which is part of the Multimodal Sentiment Analys Challenge (MuSe) 2023. The MuSe-Humor task aims to detect humor from multimodal data, including video, audio, and text, in a cross-cultural context. The training data consists of German recordings, while the test data consists of English recordings. To tackle this sub-challenge, we propose a method called MMT-GD, which leverages a multimodal transformer model to effectively integrate the multimodal data. Additionally, we incorporate graph distillation to ensure that the fusion process captures discriminative features from each modality, avoiding excessive reliance on any single modality. Experimental results validate the effectiveness of our approach, achieving an Area Under the Curve (AUC) score of 0.8704 on the test set and securing the third position in the challenge.

References

[1]

Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Messner, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023. MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. In Proceedings of the 31st ACM International Conference on Multimedia (MM'23), October 29-November 2, 2023, Ottawa, Canada. Association for Computing Machinery, Ottawa, Canada. to appear.

Digital Library

[2]

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, and Björn Schuller. 2017. Snore Sound Classification Using Image-based Deep Spectrum Features. In Proceedings INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association. ISCA, ISCA, Stockholm, Sweden, 3512--3516.

[3]

Issa Annamoradnejad and Gohar Zoghi. 2020. ColBERT: Using BERT Sentence Embedding in Parallel Neural Networks for Computational Humor.

[4]

Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, Vol. 33 (2020), 12449--12460.

[5]

Kim Binsted et al. 1995. Using humour to make natural language interfaces more friendly. In Proceedings of the ai, alife and entertainment workshop, intern. Joint conf. On artificial intelligence.

[6]

Arnie Cann, Amanda J Watson, and Elisabeth A Bridgewater. 2014. Assessing humor at work: The humor climate questionnaire. Humor, Vol. 27, 2 (2014), 307--323.

[7]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.

[8]

Chengxin Chen and Pengyuan Zhang. 2022. Integrating Cross-Modal Interactions via Latent Representation Shift for Multi-Modal Humor Detection. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe' 22). Association for Computing Machinery, New York, NY, USA, 23--28. https://doi.org/10.1145/3551876.3554805

Digital Library

[9]

Peng-Yu Chen and Von-Wun Soo. 2018. Humor Recognition Using Deep Learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 113--117. https://doi.org/10.18653/v1/N18--2018

[10]

Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2023 a. The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation. In MuSe'23: Proceedings of the 4th Multimodal Sentiment Analysis Workshop and Challenge. Association for Computing Machinery. co-located with ACM Multimedia 2022, to appear.

[11]

Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, and Björn W. Schuller. 2023 b. Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. arxiv: 2209.14272 [cs.LG]

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186.

[13]

Florian Eyben, Klaus R Scherer, Björn W Schuller, Johan Sundberg, Elisabeth André, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, Vol. 7, 2 (2015), 190--202.

Digital Library

[14]

Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia. Association for Computing Machinery, Firenze, Italy, 1459--1462.

Digital Library

[15]

Xiaochao Fan, Hongfei Lin, Liang Yang, Yufeng Diao, Chen Shen, Yonghe Chu, and Tongxuan Zhang. 2020. Phonetics and ambiguity comprehension gated attention network for humor recognition. Complexity, Vol. 2020 (2020), 1--9.

[16]

Panagiotis Gkorezis, Eugenia Petridou, and Panteleimon Xanthiakos. 2014. Leader positive humor and organizational cynicism: LMX as a mediator. Leadership & Organization Development Journal, Vol. 35 (2014), 305 -- 315.

[17]

Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, and Dong Hyun and Lee. 2013. Challenges in Representation Learning: A report on three machine learning contests. In Springer Berlin Heidelberg.

[18]

Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, and Soujanya Poria. 2021. Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis. In Proceedings of the 2021 International Conference on Multimodal Interaction (Montréal, QC, Canada) (ICMI '21). Association for Computing Machinery, New York, NY, USA, 6--15. https://doi.org/10.1145/3462244.3479919

Digital Library

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90

[20]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR, Vol. abs/1412.6980 (2014).

[21]

Anna Ladilova and Ulrike Schröder. 2022. Humor in intercultural interaction: A source for misunderstanding or a common ground builder? A multimodal analysis. Intercultural Pragmatics, Vol. 19, 1 (2022), 71--101.

[22]

Yong Li, Yuanzhi Wang, and Zhen Cui. 2023. Decoupled Multimodal Distilling for Emotion Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6631--6640.

[23]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection. arxiv: 1708.02002 [cs.CV]

[24]

R. Lotfian and C. Busso. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings. IEEE Transactions on Affective Computing, Vol. 10, 4 (October-December 2019), 471--483. https://doi.org/10.1109/TAFFC.2017.2736999

[25]

Rod A. Martin, Patricia Puhlik-Doris, Gwen Larsen, Jeanette Gray, and Kelly Weir. 2003. Individual differences in uses of humor and their relation to psychological well-being: Development of the Humor Styles Questionnaire. Journal of Research in Personality, Vol. 37, 1 (2003), 48--75. https://doi.org/10.1016/S0092--6566(02)00534--2

[26]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.

Digital Library

[27]

Shraman Pramanick, Aniket Basu Roy, and Vishal M. Patel. 2021. Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021), 546--556.

[28]

Béatrice Priego-Valverde, Brigitte Bigi, Salvatore Attardo, Lucy Pickering, and Elisa Gironzetti. 2018. Is smiling during humor so obvious? A cross-cultural comparison of smiling behavior in humorous sequences in American English and French interactions. Intercultural Pragmatics, Vol. 15 (2018), 563 -- 591.

[29]

Zhibang Quan, Tao Sun, Mengli Su, and Jishu Wei. 2022. Multimodal Humor Detection Based on Cross-Modal Attention and Modal Maximum Correlation. In 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA). 1--2. https://doi.org/10.1109/DSAA54385.2022.10032426

[30]

Lu Ren, Bo Xu, Hongfei Lin, Jinhui Zhang, and Liang Yang. 2022. An Attention Network via Pronunciation, Lexicon and Syntax for Humor Recognition. Applied Intelligence, Vol. 52, 3 (feb 2022), 2690--2702. https://doi.org/10.1007/s10489-021-02580--3

Digital Library

[31]

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal Transformer for Unaligned Multimodal Language Sequences. Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019 (2019), 6558--6569.

[32]

Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.

[33]

Haojie Xu, Weifeng Liu, Jiangwei Liu, Mingzheng Li, Yu Feng, Yasi Peng, Yunwei Shi, Xiao Sun, and Meng Wang. 2022. Hybrid Multimodal Fusion for Humor Detection. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge (Lisboa, Portugal) (MuSe' 22). Association for Computing Machinery, New York, NY, USA, 15--21. https://doi.org/10.1145/3551876.3554802

Digital Library

[34]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters, Vol. 23 (04 2016).

[35]

Zengqun Zhao, Qingshan Liu, and Shanmin Wang. 2021. Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Transactions on Image Processing, Vol. 30 (2021), 6544--6556. https://doi.org/10.1109/TIP.2021.3093397 io

Digital Library

Cited By

Amiriparian SChrist LKathan AGerczuk MMüller NKlug SStappen LKönig ACambria ESchuller BEulitz SAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689088
Wen ZYao HChen SSun HXu MSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689087
Chen SYao HXu MWen ZSun HSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)DPP: A Dual-Phase Processing Method for Cross-Cultural Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689080(70-78)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689080
Show More Cited By

Index Terms

MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

JTMA: Joint Multimodal Feature Fusion and Temporal Multi-head Attention for Humor Detection
MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

In this paper, we propose a model named Joint multimodal feature fusion and Temporal Multi-head Attention (JTMA) to solve the MuSe-Humor sub-challenge in Multimodal Sentiment Analysis Challenge 2023. The goal of MuSe-Humor sub-challenge is to predict ...
Integrating Cross-modal Interactions via Latent Representation Shift for Multi-modal Humor Detection
MuSe' 22: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge

Multi-modal sentiment analysis has been an active research area and has attracted increasing attention from multi-disciplinary communities. However, it is still challenging to fuse the information from different modalities in an efficient way. In prior ...
Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights
- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MuSe '23: Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation

November 2023

113 pages

ISBN:9798400702709

DOI:10.1145/3606039

General Chairs:
Shahin Amiriparian
University of Augsburg, Germany
,
Lukas Christ
University of Augsburg, Germany
,
Andreas Konig
University of Passau, Germany
,
Alan Cowen
Hume AI, USA
,
Eva-Maria Meßner
University of Ulm, Germany
,
Erik Cambria
Nanyang Technological University, Singapore
,
Bjorn W. Schuller
Imperial College London, UK

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 14 of 17 submissions, 82%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
129
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Amiriparian SChrist LKathan AGerczuk MMüller NKlug SStappen LKönig ACambria ESchuller BEulitz SAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor RecognitionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689088(1-9)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689088
Wen ZYao HChen SSun HXu MSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Social Perception Prediction for MuSe 2024: Joint Learning of Multiple PerceptionsProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689087(52-59)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689087
Chen SYao HXu MWen ZSun HSun LLian ZLiu BZhang FZhang STao JAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)DPP: A Dual-Phase Processing Method for Cross-Cultural Humor DetectionProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689080(70-78)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689080
Yu JZhu WZhu JCai ZZhao GZhang ZXie GWei ZLiu QLiang J(2024)Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00490(4866-4872)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00490
Amiriparian SChrist LKönig ACowen AMeßner ECambria ESchuller BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of AffectsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3610943(9723-9725)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3610943

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten