research-article

DataChat: An Intuitive and Collaborative Data Analytics Platform

Authors:

Rogers Jeffrey Leo John,

Jignesh M. PatelAuthors Info & Claims

SIGMOD '23: Companion of the 2023 International Conference on Management of Data

Pages 203 - 215

https://doi.org/10.1145/3555041.3589678

Published: 05 June 2023 Publication History

Get Access

Abstract

Enterprises invest in data platforms with the aim of extracting meaningful information through analytics. Typically, experts create analytics pipelines that feed into dashboards and provide answers to predetermined questions. This approach makes analytics a spectator sport for most people and introduces operational bottlenecks to leveraging those investments. To improve the value derived from data, many organizations are opting to open up their data assets and allow access to a wider range of users. However, using programming languages such as SQL and Python for analytics can be difficult for most enterprise users. DataChat provides a simplified data science approach that is intuitive, powerful, and accessible to all data users. The platform is built on a library of data functions that are cleanly abstracted to maximize efficiency and ease of use while maintaining a rich suite of tools necessary for data science. With these functions, users can create data analysis pipelines by using a simple point-and-click interface in a spreadsheet view or by using natural English interfaces. Modern sharing and collaboration features are central to all aspects of the platform, allowing teams to easily bridge expertise gaps. A deeper understanding of results is facilitated by providing automatically-generated English explanations of how they were derived. By enhancing these aspects of data science and human-to-human communication, the platform addresses the needs that many organizations are encountering as their analytics needs mature.

Supplemental Material

MP4 File

Presentation video of the DataChat platform including large language model integration to generate complex data analytics pipelines from natural language user requests. The presentation starts by explaining how the DataChat platform simplifies data analytics operations into Skills users can execute individually. It then proceeds to explain how these Skills can be used as discrete building blocks in repeatable data analysis Recipes. Finally, the presentation demonstrates how such a Recipe can be fully generated using a large language model based on simple natural language prompts from the user.

Download
62.36 MB

References

[1]

Mangesh Bendre, Bofan Sun, Ding Zhang, Xinyan Zhou, Kevin Chen-Chuan Chang, and Aditya G. Parameswaran. 2015. DATASPREAD: Unifying Databases and Spreadsheets. Proc. VLDB Endow., Vol. 8, 12 (2015), 2000--2003. https://doi.org/10.14778/2824032.2824121

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Scale-out beyond map-reduce

Introduction to Big Data: Scalable Representation and Analytics for Data Science Minitrack

Big data analytics in Cloud computing: an overview

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Data Availability

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations