ABSTRACT
Code-generation suits well for reverse mode automatic differentiation as it stores each partial derivative as a virtual register. Since the introduction of just-in-time compilation in PostgreSQL, an efficient operator for automatic differentiation seems feasible. Such an operator would allow for in-database machine learning and thus eliminate the need for data extraction.
In this paper, we propose automatic differentiation in PostgreSQL: We extend our proposed SQL lambda functions to compute the derivatives even for matrix operations by traversing the expression tree. The evaluation proves that the compiled execution is up to six times faster than the interpreted runs for regression tasks.
- Yuki Asada et al. 2022. Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem. Proc. VLDB Endow. 15, 12 (2022), 3598--3601.Google ScholarDigital Library
- Matthias Boehm et al. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. www.cidrdb.org.Google Scholar
- Patrick Damme et al. 2023. Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility. In BTW (LNI). GI.Google Scholar
- Rainer Gemulla et al. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD. ACM, 69--77.Google Scholar
- Rihan Hai et al. 2022. Amalur: Next-generation Data Integration in Data Lakes. In CIDR. www.cidrdb.org.Google Scholar
- Zoi Kaoudi and Jorge-Arnulfo Quiané-Ruiz. 2022. Unified Data Analytics: State-of-the-art and Open Problems. Proc. VLDB Endow. 15, 12 (2022), 3778--3781.Google ScholarDigital Library
- Side Li and Arun Kumar. 2021. Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning. Proc. VLDB Endow. 14, 11 (2021), 2327--2340.Google ScholarDigital Library
- Brandon Lockhart et al. 2021. Explaining Inference Queries with Bayesian Optimization. Proc. VLDB Endow. 14, 11 (2021), 2576--2585.Google ScholarDigital Library
- Nantia Makrynioti, Ruy Ley-Wild, and Vasilis Vassalos. 2021. Machine learning in SQL by translation to TensorFlow. In DEEM@SIGMOD. ACM, 2:1--2:11.Google Scholar
- Leonel Aguilar Melgar et al. 2021. Ease.ML: A Lifecycle Management System for Machine Learning. In CIDR. www.cidrdb.org.Google Scholar
- Laurel J. Orr et al. 2021. Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems. Proc. VLDB Endow. 14, 12 (2021), 3178--3181.Google ScholarDigital Library
- Cédric Renggli et al. 2022. SHiFT: An Efficient, Flexible Search Engine for Transfer Learning. Proc. VLDB Endow. 16, 2 (2022), 304--316.Google ScholarDigital Library
- Sebastian Schelter et al. 2022. Screening Native Machine Learning Pipelines with ArgusEyes. In CIDR. www.cidrdb.org.Google Scholar
- Maximilian E. Schüle et al. 2020. Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM, 6:1--6:12.Google Scholar
- Maximilian E. Schüle et al. 2021. In-Database Machine Learning with SQL on GPUs. In SSDBM. ACM, 25--36.Google Scholar
- Maximilian E. Schüle et al. 2022. ArrayQL Integration into Code-Generating Database Systems. In EDBT.Google Scholar
- Maximilian E. Schüle et al. 2022. Recursive SQL and GPU-support for in-database machine learning. Distributed Parallel Databases 40, 2--3 (2022), 205--259.Google ScholarDigital Library
- Nils Strassenburg, Ilin Tolovski, and Tilmann Rabl. 2022. Efficiently Managing Deep Learning Models in a Distributed Environment. In EDBT. 2:234--2:246.Google Scholar
- Xiaoying Wang et al. 2022. ConnectorX: Accelerating Data Loading From Databases to Dataframes. Proc. VLDB Endow. 15, 11 (2022), 2994--3003.Google ScholarDigital Library
- Zixuan Zhao et al. 2022. Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517.Google Scholar
Index Terms
- Teaching Blue Elephants the Maths for Machine Learning
Recommendations
In-Database Machine Learning with SQL on GPUs
SSDBM '21: Proceedings of the 33rd International Conference on Scientific and Statistical Database ManagementIn machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have rarely ...
Recursive SQL and GPU-support for in-database machine learning
AbstractIn machine learning, continuously retraining a model guarantees accurate predictions based on the latest data as training input. But to retrieve the latest data from a database, time-consuming extraction is necessary as database systems have ...
LLVM code optimisation for automatic differentiation: when forward and reverse mode lead in the same direction
DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine LearningBoth forward and reverse mode automatic differentiation derive a model function as used for gradient descent automatically. Reverse mode calculates all derivatives in one run, whereas forward mode requires rerunning the algorithm with respect to every ...
Comments