skip to main content
10.1145/3581783.3617350acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
panel

Panel: Multimodal Large Foundation Models

Published: 27 October 2023 Publication History

Abstract

The surprisingly fluent predictive performance of LLM (Large Language Models) as well as the high-quality photo-realistic rendering of Diffusion Models has heralded a new beginning in the area of Generative AI. Such kinds of deep learning based models with billions of parameters and pre-trained on massive-scale data-sets are also called Large Foundation Models (LFM). These models not only have caught the public imagination but also have led to an unprecedented surge in interest towards the applications of these models. Instead of the previous approach of developing AI models for specific tasks, more and more researchers are developing large task-agnostic models pre-trained on massive data, which can then be adapted to a variety of downstream tasks via fine-tuning, fewshot learning, or zero-shot learning. Some examples are ChatGPT, LLaMA, GPT-4, Flamingo, MidJourney, Stable-Diffusion and DALLE. Some of them can handle text (e.g., ChatGPT, LLaMA) while some others (e.g., GPT-4 and Flamingo) can utilize multimodal data and can hence be considered Multimodal Large Foundation Models (MLFM).
Several recent studies have shown that when adapted to specific tasks (e.g., visual question answering), the foundation models can often surpass the performance of state-of-the-art, fully supervised AI models. However, applying foundation models to specialized domain tasks (e.g., medical diagnosis, financial recommendation etc.) raises many ethical issues (e.g., privacy, model bias or hallucinations).
The panel members will discuss the emerging trends in the development and use of large multimodal foundation models. Some of the issues to be discussed are: Research issues in going from LLM to MLFM Behaviour of MLFM Application Potential of MLFM Trust issues in MLFM Limitations of MLFM Societal, Legal and Regulatory issues of MLFM Promising future research in MLFM This panel will bring together several leading experts from universities, research institutions, and industry who will discuss and debate together with the audience. We invite everybody to participate and contribute towards this important and promising research direction.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Check for updates

Author Tags

  1. deep learning
  2. foundation models
  3. large language models
  4. multimodal models

Qualifiers

  • Panel

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 141
    Total Downloads
  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)5
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media