research-article

Teaching Blue Elephants the Maths for Machine Learning

Authors:
Clemens Ruck

TUM, Munich, DE

TUM, Munich, DE

https://orcid.org/0000-0002-8471-8700
View Profile

,
Maximilian Emanuel Schüle

University of Bamberg, Bamberg, DE

University of Bamberg, Bamberg, DE

https://orcid.org/0000-0003-1546-269X
View Profile

DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine LearningJune 2023Article No.: 2Pages 1–4https://doi.org/10.1145/3595360.3595852

Published:18 June 2023Publication History

DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning

Pages 1–4

ABSTRACT

Code-generation suits well for reverse mode automatic differentiation as it stores each partial derivative as a virtual register. Since the introduction of just-in-time compilation in PostgreSQL, an efficient operator for automatic differentiation seems feasible. Such an operator would allow for in-database machine learning and thus eliminate the need for data extraction.

In this paper, we propose automatic differentiation in PostgreSQL: We extend our proposed SQL lambda functions to compute the derivatives even for matrix operations by traversing the expression tree. The evaluation proves that the compiled execution is up to six times faster than the interpreted runs for regression tasks.

References

Yuki Asada et al. 2022. Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem. Proc. VLDB Endow. 15, 12 (2022), 3598--3601.Google ScholarDigital Library
Matthias Boehm et al. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. www.cidrdb.org.Google Scholar
Patrick Damme et al. 2023. Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility. In BTW (LNI). GI.Google Scholar
Rainer Gemulla et al. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD. ACM, 69--77.Google Scholar
Rihan Hai et al. 2022. Amalur: Next-generation Data Integration in Data Lakes. In CIDR. www.cidrdb.org.Google Scholar
Zoi Kaoudi and Jorge-Arnulfo Quiané-Ruiz. 2022. Unified Data Analytics: State-of-the-art and Open Problems. Proc. VLDB Endow. 15, 12 (2022), 3778--3781.Google ScholarDigital Library
Side Li and Arun Kumar. 2021. Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning. Proc. VLDB Endow. 14, 11 (2021), 2327--2340.Google ScholarDigital Library
Brandon Lockhart et al. 2021. Explaining Inference Queries with Bayesian Optimization. Proc. VLDB Endow. 14, 11 (2021), 2576--2585.Google ScholarDigital Library
Nantia Makrynioti, Ruy Ley-Wild, and Vasilis Vassalos. 2021. Machine learning in SQL by translation to TensorFlow. In DEEM@SIGMOD. ACM, 2:1--2:11.Google Scholar
Leonel Aguilar Melgar et al. 2021. Ease.ML: A Lifecycle Management System for Machine Learning. In CIDR. www.cidrdb.org.Google Scholar
Laurel J. Orr et al. 2021. Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems. Proc. VLDB Endow. 14, 12 (2021), 3178--3181.Google ScholarDigital Library
Cédric Renggli et al. 2022. SHiFT: An Efficient, Flexible Search Engine for Transfer Learning. Proc. VLDB Endow. 16, 2 (2022), 304--316.Google ScholarDigital Library
Sebastian Schelter et al. 2022. Screening Native Machine Learning Pipelines with ArgusEyes. In CIDR. www.cidrdb.org.Google Scholar
Maximilian E. Schüle et al. 2020. Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM, 6:1--6:12.Google Scholar
Maximilian E. Schüle et al. 2021. In-Database Machine Learning with SQL on GPUs. In SSDBM. ACM, 25--36.Google Scholar
Maximilian E. Schüle et al. 2022. ArrayQL Integration into Code-Generating Database Systems. In EDBT.Google Scholar
Maximilian E. Schüle et al. 2022. Recursive SQL and GPU-support for in-database machine learning. Distributed Parallel Databases 40, 2--3 (2022), 205--259.Google ScholarDigital Library
Nils Strassenburg, Ilin Tolovski, and Tilmann Rabl. 2022. Efficiently Managing Deep Learning Models in a Distributed Environment. In EDBT. 2:234--2:246.Google Scholar
Xiaoying Wang et al. 2022. ConnectorX: Accelerating Data Loading From Databases to Dataframes. Proc. VLDB Endow. 15, 11 (2022), 2994--3003.Google ScholarDigital Library
Zixuan Zhao et al. 2022. Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517.Google Scholar

Index Terms

Teaching Blue Elephants the Maths for Machine Learning

Index terms have been assigned to the content through auto-classification.

Recommendations

In-Database Machine Learning with SQL on GPUs
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database Management

In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely ...
Read More
Recursive SQL and GPU-support for in-database machine learning
Abstract
In machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have ...
Read More
LLVM code optimisation for automatic differentiation: when forward and reverse mode lead in the same direction
DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning

Both forward and reverse mode automatic differentiation derive a model function as used for gradient descent automatically. Reverse mode calculates all derivatives in one run, whereas forward mode requires rerunning the algorithm with respect to every ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning
June 2023
51 pages
ISBN:9798400702044
DOI:10.1145/3595360
Co-chairs:
Matthias Boehm,
Madelon Hulsebos,
Shreya Shankar,
Paroma Varma
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
in-database machine learning
automatic differentiation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate23of37submissions,62%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 68
  Total Downloads
- Downloads (Last 12 months)68
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Teaching Blue Elephants the Maths for Machine Learning

DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

In-Database Machine Learning with SQL on GPUs

Recursive SQL and GPU-support for in-database machine learning

LLVM code optimisation for automatic differentiation: when forward and reverse mode lead in the same direction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Teaching Blue Elephants the Maths for Machine Learning

DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

In-Database Machine Learning with SQL on GPUs

Recursive SQL and GPU-support for in-database machine learning

LLVM code optimisation for automatic differentiation: when forward and reverse mode lead in the same direction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media