skip to main content
10.1145/3581783.3612665acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
demonstration

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM

Published: 27 October 2023 Publication History

Abstract

Inspired by the recent developments of large language models (LLMs), we propose mPLUG-Octopus, a versatile conversational assistant designed to provide users with coherent, engaging, and helpful interaction experiences in both text-only and multi-modal scenarios. Unlike traditional pipeline chatting systems, mPLUG-Octopus offers a diverse range of creative capabilities including open-domain QA, multi-turn chatting, and multi-modal creation, all built with a unified multimodal LLM without relying on any external API. With the modularized end-to-end multimodal LLM technology, mPLUG-Octopus efficiently facilitates engaging and open-domain conversation experience. It exhibits a wide range of uni/multi-modal elemental capabilities, enabling it to seamlessly communicate with users on open-domain topics and engage in multi-turn conversations. It also assists users in accomplishing various content creation and application tasks. Our conversational assistant can also be deployed on smart hardware to drive advanced AIGC applications.

References

[1]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In NeurIPS.
[2]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR. OpenReview.net.
[3]
OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
[4]
Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 10674--10685.
[5]
Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, and Yi Zhang. 2021. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System. In Annual Meeting of the Association for Computational Linguistics.
[6]
Junfeng Tian, Hehong Chen, Guohai Xu, Mingshi Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qiuchen Qian, Wei Wang, Qinghao Ye, Jie Zhang, Ji Zhang, Feiran Huang, and Jingren Zhou. 2023. Chat-PLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human. ArXiv abs/2304.07849 (2023).
[7]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. CoRR abs/2302.13971 (2023).
[8]
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models. CoRR abs/2303.04671 (2023).
[9]
Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and LijuanWang. 2023. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. CoRR abs/2303.11381 (2023).
[10]
Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yi Zhou, Junyan Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qiang Qi, Ji Chao Zhang, and Feiyan Huang. 2023. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality. ArXiv abs/2304.14178 (2023).
[11]
Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen HenryWu, Hao Sun, Xiaocong Yang, BosiWen, Xiaoyan Zhu, Minlie Huang, and Jie Tang. 2021. EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training. ArXiv abs/2108.01547 (2021).

Cited By

View all
  • (2024)Proposal of User Interface Based on Heavy User Usage Analysis in LLM ServiceArchives of Design Research10.15187/adr.2024.08.37.4.28737:4(287-313)Online publication date: 31-Aug-2024

Index Terms

  1. mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Check for updates

    Author Tags

    1. instruction tuning
    2. large language model
    3. modularized model
    4. multi-modal dialogue chatbot
    5. multi-modal model

    Qualifiers

    • Demonstration

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)198
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Proposal of User Interface Based on Heavy User Usage Analysis in LLM ServiceArchives of Design Research10.15187/adr.2024.08.37.4.28737:4(287-313)Online publication date: 31-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media