skip to main content
10.1145/3626246.3654732acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Demonstrating CAESURA: Language Models as Multi-Modal Query Planners

Published: 09 June 2024 Publication History

Abstract

In many domains, multi-modal data takes an important role and modern question-answering systems based on LLMs allow users to query this data using simple natural language queries. Retrieval Augmented Generation (RAG) is a recent approach that extends Large Language Models (LLM) with database technology to enable such multi-modal QA systems. In RAG, relevant data is first retrieved from a vector database and then fed into an LLM that computes the query result. However, RAG-based approaches have severe issues, such as regarding efficiency and scalability, since LLMs have high inference costs and can only process limited amounts of data. Therefore, in this demo paper, we propose CAESURA, a database-first approach that extends databases with LLMs. The main idea is that CAESURA utilizes the reasoning capabilities of LLMs to translate natural language queries into execution plans. Using such execution plans allows CAESURA to process multi-modal data outside the LLM using query operators and optimization strategies that are footed in scalable query execution strategies of databases. Our demo allows users to experience CAESURA on two example datasets containing tables, texts, and images1.

Supplemental Material

MP4 File
Presentation video of CAESURA

References

[1]
Gemini Team at Google. 2023. Gemini: A Family of Highly Capable Multimodal Models. arxiv: 2312.11805 [cs]
[2]
Zui Chen, Zihui Gu, Lei Cao, Ju Fan, Sam Madden, and Nan Tang. 2023. Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes. (2023).
[3]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arxiv: 2401.08281 [cs.LG]
[4]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rockt"aschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 9459--9474.
[5]
OpenAI. 2023. GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774 arxiv: 2303.08774 [cs]
[6]
Matthias Urban and Carsten Binnig. 2024. CAESURA: Language Models as Multi-Modal Query Planners. In 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, CA, USA, January 14--17, 2024. www.cidrdb.org. https://www.cidrdb.org/cidr2024/papers/p14-urban.pdf
[7]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs] (April 2022). arxiv: 2201.11903 [cs]
[8]
Sam Wiseman, Stuart M. Shieber, and Alexander M. Rush. 2017. Challenges in Data-to-Document Generation. arXiv:1707.08052 [cs] (July 2017). arxiv: 1707.08052 [cs]
[9]
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual ChatGPT : Talking, Drawing and Editing with Visual Foundation Models. arxiv: 2303.04671 [cs]
[10]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arxiv: 2210.03

Cited By

View all
  • (2024)ELEET: Efficient Learned Query Execution over Text and TablesProceedings of the VLDB Endowment10.14778/3704965.370498917:13(4867-4880)Online publication date: 1-Sep-2024
  • (2024)ReAcTable: Enhancing ReAct for Table Question AnsweringProceedings of the VLDB Endowment10.14778/3659437.365945217:8(1981-1994)Online publication date: 31-May-2024

Index Terms

  1. Demonstrating CAESURA: Language Models as Multi-Modal Query Planners

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
    June 2024
    694 pages
    ISBN:9798400704222
    DOI:10.1145/3626246
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. large language models
    2. multi-modal
    3. query planning

    Qualifiers

    • Short-paper

    Funding Sources

    • BMBF and State of Hesse

    Conference

    SIGMOD/PODS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)252
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ELEET: Efficient Learned Query Execution over Text and TablesProceedings of the VLDB Endowment10.14778/3704965.370498917:13(4867-4880)Online publication date: 1-Sep-2024
    • (2024)ReAcTable: Enhancing ReAct for Table Question AnsweringProceedings of the VLDB Endowment10.14778/3659437.365945217:8(1981-1994)Online publication date: 31-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media