demonstration

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM

Authors:

Changsheng XuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 9365 - 9367

https://doi.org/10.1145/3581783.3612665

Published: 27 October 2023 Publication History

Get Access

Abstract

Inspired by the recent developments of large language models (LLMs), we propose mPLUG-Octopus, a versatile conversational assistant designed to provide users with coherent, engaging, and helpful interaction experiences in both text-only and multi-modal scenarios. Unlike traditional pipeline chatting systems, mPLUG-Octopus offers a diverse range of creative capabilities including open-domain QA, multi-turn chatting, and multi-modal creation, all built with a unified multimodal LLM without relying on any external API. With the modularized end-to-end multimodal LLM technology, mPLUG-Octopus efficiently facilitates engaging and open-domain conversation experience. It exhibits a wide range of uni/multi-modal elemental capabilities, enabling it to seamlessly communicate with users on open-domain topics and engage in multi-turn conversations. It also assists users in accomplishing various content creation and application tasks. Our conversational assistant can also be deployed on smart hardware to drive advanced AIGC applications.

References

[1]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In NeurIPS.

Google Scholar

[2]

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR. OpenReview.net.

Google Scholar

[3]

OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.

Google Scholar

[4]

Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 10674--10685.

Google Scholar

[5]

Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, and Yi Zhang. 2021. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System. In Annual Meeting of the Association for Computational Linguistics.

Google Scholar

[6]

Junfeng Tian, Hehong Chen, Guohai Xu, Mingshi Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qiuchen Qian, Wei Wang, Qinghao Ye, Jie Zhang, Ji Zhang, Feiran Huang, and Jingren Zhou. 2023. Chat-PLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human. ArXiv abs/2304.07849 (2023).

Google Scholar

[7]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. CoRR abs/2302.13971 (2023).

Google Scholar

[8]

Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models. CoRR abs/2303.04671 (2023).

Google Scholar

[9]

Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and LijuanWang. 2023. MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. CoRR abs/2303.11381 (2023).

Google Scholar

[10]

Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yi Zhou, Junyan Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qiang Qi, Ji Chao Zhang, and Feiyan Huang. 2023. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality. ArXiv abs/2304.14178 (2023).

Google Scholar

[11]

Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen HenryWu, Hao Sun, Xiaocong Yang, BosiWen, Xiaoyan Zhu, Minlie Huang, and Jie Tang. 2021. EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training. ArXiv abs/2108.01547 (2021).

Google Scholar

Cited By

View all

Chin JChin JLee SLee SPark CPark CYeoun MYeoun M(2024)Proposal of User Interface Based on Heavy User Usage Analysis in LLM ServiceArchives of Design Research10.15187/adr.2024.08.37.4.28737:4(287-313)Online publication date: 31-Aug-2024
https://doi.org/10.15187/adr.2024.08.37.4.287

Index Terms

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM
1. Computing methodologies
  1. Artificial intelligence

Recommendations

An End-to-End Conversational Style Matching Agent
IVA '19: Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents

We present an end-to-end voice-based conversational agent that is able to engage in naturalistic multi-turn dialogue and align with the interlocutor's conversational style. The system uses a series of deep neural network components for speech ...
End-to-End Multimodal Learning for Situated Dialogue Systems
Natural Language, Mixed-initiative Personal Assistant Agents
IMCOM '18: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication

The increasing popularity and use of personal voice assistant technologies, such as Siri and Google Now, is driving and expanding progress toward the long-term and lofty goal of using artificial intelligence to build human-computer dialog systems ...

Comments

Information & Contributors

Information

Published In

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Check for updates

Author Tags

Qualifiers

Demonstration

Funding Sources

Beijing Natural Science Foundation
National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)9

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chin JChin JLee SLee SPark CPark CYeoun MYeoun M(2024)Proposal of User Interface Based on Heavy User Usage Analysis in LLM ServiceArchives of Design Research10.15187/adr.2024.08.37.4.28737:4(287-313)Online publication date: 31-Aug-2024
https://doi.org/10.15187/adr.2024.08.37.4.287

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

An End-to-End Conversational Style Matching Agent

End-to-End Multimodal Learning for Situated Dialogue Systems

Natural Language, Mixed-initiative Personal Assistant Agents

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations