invited-talk

Addressing Challenges in Data Science: Scale, Skill Sets and Complexity

Author:
Joseph Bradley

Databricks, Inc., San Francisco, CA, USA

Databricks, Inc., San Francisco, CA, USA
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 3163https://doi.org/10.1145/3292500.3340407

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3163

ABSTRACT

Data science in modern applications is pushing the limits of tools and organizations. The scale of data, the breadth of required skill sets, and the complexity of workflows all cause organizations to stumble when developing data-powered applications and moving them to production. This talk will discuss these challenges and Databricks' efforts to overcome them within open source software projects like Apache Spark and MLflow.

Apache Spark has simplified large-scale ETL and analytics, and its Project Hydrogen helps to bridge the gap between Spark and ML tools such as TensorFlow and Horovod. MLflow, an open source platform for managing ML lifecycles, facilitates experimentation, reproducibility and deployment. We will present insights from our collaborations on these projects, as well as our perspective at Databricks in facilitating data science for a wide variety of organizations and applications.

Supplemental Material

p3163-bradley.mp4

mp4

2 GB

Download

Recommendations

Data Science: A Comprehensive Overview

The 21st century has ushered in the age of big data and data economy, in which data DNA, which carries important knowledge, insights, and potential, has become an intrinsic constituent of all data-based organisms. An appropriate understanding of data ...
Read More
Big data challenges in simulation-based science
DIDC '14: Proceedings of the sixth international workshop on Data intensive distributed computing

Data-related challenges are quickly dominating computational and data-enabled sciences, and are limiting the potential impact of scientific applications enabled by current and emerging high-performance distributed computing environments. These data-...
Read More
Doing Data Science
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Check for updates
Qualifiers
- invited-talk
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 220
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Addressing Challenges in Data Science: Scale, Skill Sets and Complexity

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

Cited By

Recommendations

Data Science: A Comprehensive Overview

Big data challenges in simulation-based science

Doing Data Science

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Addressing Challenges in Data Science: Scale, Skill Sets and Complexity

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

Cited By

Recommendations

Data Science: A Comprehensive Overview

Big data challenges in simulation-based science

Doing Data Science

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media