skip to main content
10.1145/3555041.3589722acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

CoWrangler: Recommender System for Data-Wrangling Scripts

Published: 05 June 2023 Publication History

Abstract

We present CoWrangler, a real-time data wrangling recommender system, which can recommend the next-best data wrangling operations along with the corresponding human-readable and efficient code snippets to expedite data exploration and wrangling efforts. A key feature of CoWrangler is that it provides explanations for the generated suggestions in the form of data insights, allowing the user to place confidence in the system. Under the hood, CoWrangler relies on intelligent generation of candidate suggestions using program synthesis techniques and ranking of a set of suggestions based on the notion of data quality improvement. We demonstrate how CoWrangler provides a human-in-the-loop data wrangling experience, and helps users make informed data pre-processing decisions, while saving their time and effort.

Supplemental Material

MP4 File
We present CoWrangler, a real-time data-wrangling recommender system, that can recommend the next-best data-wrangling operations along with corresponding human-readable and efficient code snippets to expedite data exploration and wrangling efforts. A key feature of CoWrangler is that it provides explanations for the generated suggestions in the form of data insights, allowing the user to place confidence in the system. Under the hood, CoWrangler relies on intelligent generation of candidate suggestions using program synthesis techniques and ranking of a set of suggestions based on the notion of data quality improvement. We demonstrate how CoWrangler provides a human-in-the-loop data-wrangling experience, and helps users make informed data pre-processing decisions.

References

[1]
R Bavishi, C Lemieux, R Fox, K Sen, and I Stoica. 2019. AutoPandas: Neural-Backed Generators for Program Synthesis. Proc. ACM Program. Lang. OOPSLA, Article 168 (oct 2019), 27 pages.
[2]
A Fariha and A Meliou. 2019. Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity. PVLDB, Vol. 12, 11 (2019), 1262--1275.
[3]
S Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In POPL. 317--330.
[4]
P J Guo, S Kandel, J Hellerstein, and J Heer. 2011. Proactive Wrangling: Mixed-Initiative End-User Programming of Data Transformation Scripts. In UIST. 65--74.
[5]
N Jain, S Vaidyanath, A Iyer, N Natarajan, S Parthasarathy, S Rajamani, and R Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In ICSE. 1219--1231.
[6]
S Kandel, A Paepcke, J Hellerstein, and J Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In CHI. 3363--3372.
[7]
G Press. 2016. Cleaning Big Data. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/.
[8]
M Raza and S Gulwani. 2017. Automated data extraction using predictive program synthesis. In AAAI, Vol. 31.
[9]
C Yan and Y He. 2020. Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks. In SIGMOD. 1539--1554.

Index Terms

  1. CoWrangler: Recommender System for Data-Wrangling Scripts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '23: Companion of the 2023 International Conference on Management of Data
    June 2023
    330 pages
    ISBN:9781450395076
    DOI:10.1145/3555041
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automated suggestions
    2. data wrangling
    3. predictive synthesis

    Qualifiers

    • Short-paper

    Data Availability

    We present CoWrangler, a real-time data-wrangling recommender system, that can recommend the next-best data-wrangling operations along with corresponding human-readable and efficient code snippets to expedite data exploration and wrangling efforts. A key feature of CoWrangler is that it provides explanations for the generated suggestions in the form of data insights, allowing the user to place confidence in the system. Under the hood, CoWrangler relies on intelligent generation of candidate suggestions using program synthesis techniques and ranking of a set of suggestions based on the notion of data quality improvement. We demonstrate how CoWrangler provides a human-in-the-loop data-wrangling experience, and helps users make informed data pre-processing decisions. https://dl.acm.org/doi/10.1145/3555041.3589722#SIGMOD23-demo-42.mp4

    Conference

    SIGMOD/PODS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 225
      Total Downloads
    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media