skip to main content
10.1145/3664647.3684994acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
abstract

Video Editing Chatbot: Language-Driven Video Compositing System

Published: 28 October 2024 Publication History

Abstract

In this work, we present a video editing chatbot (VEC) that performs intelligent multimedia editing through natural language dialogue. VEC comprises three modules: instruction analysis, multimedia resources retrieval, and multimedia resources editing. It analyzes user instructions to retrieve relevant multimedia resources from the multimedia database (MMDB), and then applies appropriate editing methods from the multimedia toolbase (MMTB) automatically. To enhance user experience and simplify operation, VEC uses a multi-turn dialogue mechanism to handle complex editing tasks.

Supplemental Material

MP4 File - tde0037-video.mp4
This video presents a Language-Driven Video Editing Chatbot System (VEC). It introduces the technical framework behind VEC and provides an overview of its foundational interface and interactive features. The video demonstrates how VEC enables users to edit videos through natural language commands, highlighting its innovative approach to simplifying the video editing process.

References

[1]
Apple. 1999. Storytelling at its most powerful. https://www.apple.com/final-cutpro/
[2]
Tom B. Brown, Benjamin Mann, Nick Ryder, and Melanie Subbiah. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
[3]
Blackmagic Design. 2009. Professional editing, color blending, special effects and audio post-production! https://www.blackmagicdesign.com/cn/products/davinciresolve
[4]
Patrick Esser, Sumith Kulal, Andreas Blattmann, and Rahim Entezari. 2024. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arXiv:2403.03206 [cs.CV]
[5]
Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, and Rynson W. H. Lau. 2022. Harmonizer: Learning to Perform White-Box Image and Video Harmonization. arXiv:2207.01322 [cs.CV] https://arxiv.org/abs/2207.01322
[6]
Jaehyeon Kim, Jungil Kong, and Juhee Son. 2021. Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech. arXiv:2106.06103 [cs.SD]
[7]
Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, and René Ranftl. 2022. Language-driven Semantic Segmentation. arXiv:2201.03546 [cs.CV]
[8]
Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, and Steven C. H. Hoi. 2021. Align and Prompt: Video-and-Language Pre-training with Entity Prompts.
[9]
Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, and Ming-Ming Cheng. [n. d.]. Towards An End-to-End Framework for Flow-Guided Video Inpainting.
[10]
Shanchuan Lin, Linjie Yang, Imran Saleemi, and Soumyadip Sengupta. 2021. Robust High-Resolution Video Matting with Temporal Guidance. arXiv:2108.11515 [cs.CV]
[11]
John MacKay. [n. d.]. vidio:Ai-powered professional video editing.
[12]
Microsoft. 2021. Easy to use video editor. https://clipchamp.com/zh-hans/
[13]
Runway. 2023. Gen-2: The Next Step Forward for Generative AI. https://research. runwayml.com/gen2
[14]
Aohan Zeng, Xiao Liu, Zhengxiao Du, and Zihan Wang. 2023. GLM-130B: An Open Bilingual Pre-trained Model. arXiv:2210.02414 [cs.CL]

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Check for updates

Author Tags

  1. cross-modal representation
  2. video-text retrieval
  3. vision-language understanding

Qualifiers

  • Abstract

Funding Sources

  • Horizontal Research Project
  • National Key R&D Program of China
  • Fundamental Research Funds for the Central Universities, China

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 114
    Total Downloads
  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)60
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media