skip to main content
10.1145/3595360.3595852acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Teaching Blue Elephants the Maths for Machine Learning

Published:18 June 2023Publication History

ABSTRACT

Code-generation suits well for reverse mode automatic differentiation as it stores each partial derivative as a virtual register. Since the introduction of just-in-time compilation in PostgreSQL, an efficient operator for automatic differentiation seems feasible. Such an operator would allow for in-database machine learning and thus eliminate the need for data extraction.

In this paper, we propose automatic differentiation in PostgreSQL: We extend our proposed SQL lambda functions to compute the derivatives even for matrix operations by traversing the expression tree. The evaluation proves that the compiled execution is up to six times faster than the interpreted runs for regression tasks.

References

  1. Yuki Asada et al. 2022. Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem. Proc. VLDB Endow. 15, 12 (2022), 3598--3601.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Matthias Boehm et al. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  3. Patrick Damme et al. 2023. Enabling Integrated Data Analysis Pipelines on Heterogeneous Hardware through Holistic Extensibility. In BTW (LNI). GI.Google ScholarGoogle Scholar
  4. Rainer Gemulla et al. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In KDD. ACM, 69--77.Google ScholarGoogle Scholar
  5. Rihan Hai et al. 2022. Amalur: Next-generation Data Integration in Data Lakes. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  6. Zoi Kaoudi and Jorge-Arnulfo Quiané-Ruiz. 2022. Unified Data Analytics: State-of-the-art and Open Problems. Proc. VLDB Endow. 15, 12 (2022), 3778--3781.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Side Li and Arun Kumar. 2021. Towards an Optimized GROUP BY Abstraction for Large-Scale Machine Learning. Proc. VLDB Endow. 14, 11 (2021), 2327--2340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brandon Lockhart et al. 2021. Explaining Inference Queries with Bayesian Optimization. Proc. VLDB Endow. 14, 11 (2021), 2576--2585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nantia Makrynioti, Ruy Ley-Wild, and Vasilis Vassalos. 2021. Machine learning in SQL by translation to TensorFlow. In DEEM@SIGMOD. ACM, 2:1--2:11.Google ScholarGoogle Scholar
  10. Leonel Aguilar Melgar et al. 2021. Ease.ML: A Lifecycle Management System for Machine Learning. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  11. Laurel J. Orr et al. 2021. Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems. Proc. VLDB Endow. 14, 12 (2021), 3178--3181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cédric Renggli et al. 2022. SHiFT: An Efficient, Flexible Search Engine for Transfer Learning. Proc. VLDB Endow. 16, 2 (2022), 304--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sebastian Schelter et al. 2022. Screening Native Machine Learning Pipelines with ArgusEyes. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  14. Maximilian E. Schüle et al. 2020. Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM, 6:1--6:12.Google ScholarGoogle Scholar
  15. Maximilian E. Schüle et al. 2021. In-Database Machine Learning with SQL on GPUs. In SSDBM. ACM, 25--36.Google ScholarGoogle Scholar
  16. Maximilian E. Schüle et al. 2022. ArrayQL Integration into Code-Generating Database Systems. In EDBT.Google ScholarGoogle Scholar
  17. Maximilian E. Schüle et al. 2022. Recursive SQL and GPU-support for in-database machine learning. Distributed Parallel Databases 40, 2--3 (2022), 205--259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nils Strassenburg, Ilin Tolovski, and Tilmann Rabl. 2022. Efficiently Managing Deep Learning Models in a Distributed Environment. In EDBT. 2:234--2:246.Google ScholarGoogle Scholar
  19. Xiaoying Wang et al. 2022. ConnectorX: Accelerating Data Loading From Databases to Dataframes. Proc. VLDB Endow. 15, 11 (2022), 2994--3003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zixuan Zhao et al. 2022. Leva: Boosting Machine Learning Performance with Relational Embedding Data Augmentation. In SIGMOD. ACM, 1504--1517.Google ScholarGoogle Scholar

Index Terms

  1. Teaching Blue Elephants the Maths for Machine Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning
          June 2023
          51 pages
          ISBN:9798400702044
          DOI:10.1145/3595360

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate23of37submissions,62%
        • Article Metrics

          • Downloads (Last 12 months)68
          • Downloads (Last 6 weeks)6

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader