skip to main content
10.1145/3555041.3589731acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

SmokedDuck Demonstration: SQLStepper

Published: 05 June 2023 Publication History

Abstract

Fine-grained lineage tracks the relationships between input and output of a query, and is particularly useful in analytical applications such as query debugging, view maintenance, query explanations, and data cleaning. Prior approaches rewrite SQL queries to also track lineage, but can slow query execution in analytical engines that are designed to process complex query patterns on large datasets. Moreover, they mainly capture lineage at the logical level. SmokedDuck extends DuckDB to support fast lineage capture and querying by tracking lineage at the instruction level by leveraging the duality between lineage and data movement. In this demonstration, we show how a user can leverage operator-level lineage to understand and debug a query execution through SQLStepper: an application built on top of SmokedDuck. Users upload data and execute queries using an in-browser command line, then explore query-level and operator-level lineage visually to track down bugs.

Supplemental Material

MP4 File
This video showcases SmokedDuck, a fork of DuckDB that uses physical instrumentation techniques to capture operator-level and query-level lineage with low overhead capture and fast lineage queries. SmokedDuck provides both relational and custom interfaces to access lineage.

References

[1]
Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Cidr, Vol. 5. Citeseer, 225--237.
[2]
Laura Chiticariu, Wang Chiew Tan, and Gaurav Vijayvargiya. 2005. DBNotes: A Post-it System for Relational Databases Based on Provenance. In SIGMOD. 942--944.
[3]
Yingwei Cui, Jennifer Widom, and Janet L. Wiener. 2000. Tracing the Lineage of View Data in a Warehousing Environment. TODS, Vol. 25, 2 (2000), 179--227.
[4]
Boris Glavic, Kyumars Sheykh Esmaili, Peter Michael Fischer, and Nesime Tatbul. 2013. Ariadne: Managing fine-grained provenance on data streams. In Proceedings of the 7th ACM international conference on Distributed event-based systems. 39--50.
[5]
Orestis Polychroniou, Arun Raghavan, and Kenneth A Ross. 2015. Rethinking SIMD vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1493--1508.
[6]
Fotis Psallidas and Eugene Wu. 2018. Smoke: Fine-grained Lineage at Interactive Speed. PVLDB, Vol. 11 (2018), 719 -- 732.
[7]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1981--1984. https://doi.org/10.1145/3299869.3320212
[8]
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining Away Outliers in Aggregate Queries. Proc. VLDB Endow., Vol. 6, 8 (June 2013), 553--564. https://doi.org/10.14778/2536354.2536356
[9]
Eugene Wu, Samuel Madden, and Michael Stonebraker. 2013. Subzero: a fine-grained lineage system for scientific databases. In ICDE.
[10]
Young Wu, Lampros Flokas, Jiannan Wang, and Eugene Wu. 2020. Complaint-driven Training Data Debugging for Query 2.0. In SIGMOD.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '23: Companion of the 2023 International Conference on Management of Data
June 2023
330 pages
ISBN:9781450395076
DOI:10.1145/3555041
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OLAP
  2. data exploration
  3. data provenance
  4. user interfaces

Qualifiers

  • Short-paper

Data Availability

This video showcases SmokedDuck, a fork of DuckDB that uses physical instrumentation techniques to capture operator-level and query-level lineage with low overhead capture and fast lineage queries. SmokedDuck provides both relational and custom interfaces to access lineage. https://dl.acm.org/doi/10.1145/3555041.3589731#smokedduck_demo_v2.mp4

Conference

SIGMOD/PODS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 155
    Total Downloads
  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)8
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media