skip to main content
10.1145/3689062.3689085acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and Text

Published: 28 October 2024 Publication History

Abstract

Social perception is a crucial psychological concept that explains how we understand and interpret others and their behaviors. It encompasses the complex process of discerning individual characteristics, intentions, and emotions, significantly influencing social interactions and decision-making. In this paper, we propose a modality weights based fusion model for predicting 16 social attributes in the MuSe 2024 Challenge subtask MuSe-Perception. The proposed model utilizes visual, audio, and text data from the Chief Executive Officers' (CEOs') interviews to predict these 16 social attributes. This study mainly focuses on audio feature learning and the cross-attention of audio model outputs. Furthermore, we develop each modality weight-based fusion network that combines the outputs of each modality according to their weights. Experimental results show that the proposed model achieved competitive performance for some social attributes, but there were limitations to attain consistent overall performance. Based on these results, future work includes collecting more extended CEO video data and learning the importance of each modality for different attributes. This study is expected to contribute to developing a system for analyzing investment potential by predicting the CEOs' social attributes.

References

[1]
Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Müller, Steffen Klug, Lukas Stappen, Andreas König, Erik Cambria, Björn Schuller, and Simone Eulitz. 2024. The MuSe 2024 Multimodal Sentiment Analysis Chal- lenge: Social Perception and Humor Recognition. arXiv:2406.07753 [cs.AI] https://arxiv.org/abs/2406.07753
[2]
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: a framework for self-supervised learning of speech representations. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 1044, 12 pages.
[3]
Sandra L. Bem. 1981. Bem Sex-Role Inventory: Professional manual. Consulting Psychologists Press.
[4]
Santosh Kumar Bharti, S Varadhaganapathy, Rajeev Kumar Gupta, Prashant Ku- mar Shukla, Mohamed Bouye, Simon Karanja Hingaa, and Amena Mahmoud. 2022. Text-Based Emotion Recognition Using Deep Learning Approach. Computational Intelligence and Neuroscience 2022, 1 (2022), 2645381. https://doi.org/10.1155/2022/ 2645381 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1155/2022/2645381
[5]
Fang Chen, Ziwei Shi, Zhongliang Yang, and Yongfeng Huang. 2022. Recurrent synchronization network for emotion-cause pair extraction. Know.-Based Syst. 238, C (feb 2022), 10 pages. https://doi.org/10.1016/j.knosys.2021.107965
[6]
Yucel Cimtay, Erhan Ekmekcioglu, and Seyma Caglar-Ozhan. 2020. Cross-Subject Multimodal Emotion Recognition Based on Hybrid Fusion. IEEE Access 8 (2020), 168865--168878. https://doi.org/10.1109/ACCESS.2020.3023871
[7]
Soyeon Hong, Hyeoungguk Kang, and Hyunsouk Cho. 2024. Cross-Modal Dy- namic Transfer Learning for Multimodal Emotion Recognition. IEEE Access 12 (2024), 14324--14333. https://doi.org/10.1109/ACCESS.2024.3356185
[8]
Yousif Khaireddin and Zhuo Liang Chen. 2021. Facial Emotion Recognition: State of the Art Performance on FER2013. ArXiv abs/2105.03588 (2021). https: //api.semanticscholar.org/CorpusID:234334063
[9]
Nayeon Kim, Sukhee Cho, and Byungjun Bae. 2022. SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recog- nition. Sensors 22, 15 (2022). https://doi.org/10.3390/s22155753
[10]
Andreea N. Kiss, Andres Felipe Cortes, and Pol Herrmann. 2022. CEO proactiveness, innovation, and firm performance. The Leadership Quarterly 33, 3 (2022), 101545. https://doi.org/10.1016/j.leaqua.2021.101545
[11]
Siddique Latif, Abdullah Shahid, and Junaid Qadir. 2023. Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation. Applied Acoustics 210 (2023), 109425. https://doi.org/10.1016/j. apacoust.2023.109425
[12]
Lei Liao, Yu Zhu, Bingbing Zheng, Xiaoben Jiang, and Jiajun Lin. 2022. FERGCN:facial expression recognition based on graph convolution network. Machine Vision and Applications 33 (05 2022). https://doi.org/10.1007/s00138-022-01288--9
[13]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL] https://arxiv.org/abs/1907.11692
[14]
Zhizhong Liu, Bin Zhou, Dianhui Chu, Yuhang Sun, and Lingqiang Meng. 2024. Modality translation-based multimodal sentiment analysis under un- certain missing modalities. Information Fusion 101 (2024), 101973. https://doi.org/10.1016/j.inffus.2023.101973
[15]
Frederick Hong-kit Yim Randy K. Chiu Xiaogang He Long-Zeng Wu, Ho Kwong Kwan. 2015. CEO ethical leadership and corporate social responsibility: A moderated mediation model. Journal of Business Ethics 130 (2015), 819--831. https://doi.org/10.1007/s10551-014--2108--9
[16]
E. Geoffrey Love, Jaegoo Lim, and Michael K. Bednar. 2017. The Face of the Firm: The Influence of CEOs on Corporate Reputation. Academy of Management Journal 60, 4 (2017), 1462--1481. https://doi.org/10.5465/amj.2014.0862
[17]
Gopendra Vikram Singh, Mauajama Firdaus, Dushyant Singh Chauhan, Asif Ekbal, and Pushpak Bhattacharyya. 2024. Zero-shot multitask intent and emotion prediction from multimodal data: A benchmark study. Neurocomputing 569 (2024), 127128. https://doi.org/10.1016/j.neucom.2023.127128
[18]
Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszko- reit, Mario Lucic, and Alexey Dosovitskiy. 2021. MLP-Mixer: An all-MLP Archi- tecture for Vision. arXiv:2105.01601 [cs.CV] https://arxiv.org/abs/2105.01601
[19]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ' ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/ 2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[20]
Yan Zhao, Jincen Wang, Yuan Zong, Wenming Zheng, Hailun Lian, and Li Zhao. 2023. Deep Implicit Distribution Alignment Networks for cross-Corpus Speech Emotion Recognition. ICASSP 2023 - 2023 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP) (2023), 1--5. https: //api.semanticscholar.org/CorpusID:257019992

Cited By

View all
  • (2025)Enhanced Emotion Recognition Through Dynamic Restrained Adaptive Loss and Extended Multimodal Bottleneck TransformerApplied Sciences10.3390/app1505286215:5(2862)Online publication date: 6-Mar-2025
  • (2024)MuSe '24: The 5th Multimodal Sentiment Analysis Challenge and Workshop: Social Perception & HumorProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3695939(10-11)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MuSe'24: Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor
October 2024
76 pages
ISBN:9798400711992
DOI:10.1145/3689062
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. emotion recognition
  2. multimodal fusion
  3. social perception prediction

Qualifiers

  • Research-article

Funding Sources

  • Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development grant funded by the Korea government (MSIT)
  • National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

Overall Acceptance Rate 14 of 17 submissions, 82%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)10
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Enhanced Emotion Recognition Through Dynamic Restrained Adaptive Loss and Extended Multimodal Bottleneck TransformerApplied Sciences10.3390/app1505286215:5(2862)Online publication date: 6-Mar-2025
  • (2024)MuSe '24: The 5th Multimodal Sentiment Analysis Challenge and Workshop: Social Perception & HumorProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3695939(10-11)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media