skip to main content
10.1145/3664647.3676596acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
keynote

Large Multimodal Models as Social Multimedia Analysis Engines

Published: 28 October 2024 Publication History

Abstract

Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in how LMMs perform in more specialized domains. Social media content, inherently multimodal, blends text, images, videos, and sometimes audio. To effectively understand such content, models need to interpret the intricate interactions between these diverse communication modalities and their impact on the conveyed message. Understanding social multimedia content remains a challenging problem for contemporary machine learning frameworks. In this talk, we evaluate GPT-4V(ision)'s capabilities for social multimedia analysis. We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection, to evaluate GPT-4V. Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content. GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge. In addition to the known hallucination problem, notable challenges remain as GPT-4V struggles with tasks involving multilingual social multimedia comprehension and has difficulties in generalizing to the latest trends in social media. We further present several attempts to improve the performance on some tasks. The insights gleaned from our findings underscore a promising future for LMMs in enhancing our comprehension of social media content and its users through the analysis of multimodal information.

References

[1]
Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, and Jiebo Luo. 2023. GPT-4V(ision) as A Social Media Analysis Engine. CoRR, Vol. abs/2311.07547 (2023).
[2]
Weihong Qi, Jinsheng Pan, Hanjia Lyu, and Jiebo Luo. 2024. Excitements and concerns in the post-ChatGPT era: Deciphering public perception of AI through social media analysis. Telematics Informatics, Vol. 92 (2024), 102158.
[3]
Yongsheng Yu and Jiebo Luo. 2024. Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models. CoRR, Vol. abs/2405.15687 (2024).
[4]
Daoan Zhang, Junming Yang, Hanjia Lyu, Zijian Jin, Yuan Yao, Mingkai Chen, and Jiebo Luo. 2024. CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs. CoRR, Vol. abs/2401.02582 (2024). n

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

  1. demographic inference
  2. fake news identification
  3. hate speech detection
  4. large multimodal models
  5. political ideology detection
  6. sentiment analysis
  7. social multimedia

Qualifiers

  • Keynote

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 239
    Total Downloads
  • Downloads (Last 12 months)239
  • Downloads (Last 6 weeks)147
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media