skip to main content
10.1145/3400903.3400915acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL

Published: 30 July 2020 Publication History

Abstract

As part of the code-generating database system HyPer, SQL lambda functions allow user-defined metrics to be injected into data mining operators during compile time. Since version 11, PostgreSQL has supported just-in-time compilation with LLVM for expression evaluation. This enables the concept of SQL lambda functions to be transferred to this open-source database system. In this study, we extend PostgreSQL by adding two subquery types for lambda expressions that either pre-materialise the result or return a cursor to request tuples. We demonstrate the usage of these subquery types in conjunction with dedicated table functions for data mining algorithms such as PageRank, k-Means clustering and labelling. Furthermore, we allow four levels of optimisation for query execution, ranging from interpreted function calls to just-in-time-compiled execution. The latter—with some adjustments to the PostgreSQL’s execution engine—transforms our lambda functions into real user-injected code. In our evaluation with the LDBC social network benchmark for PageRank and the Chicago taxi data set for clustering, optimised lambda functions achieved comparable performance to hard-coded implementations and HyPer’s data mining algorithms.

References

[1]
Martín Abadi 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. CoRR abs/1603.04467(2016). arxiv:1603.04467http://arxiv.org/abs/1603.04467
[2]
Christopher R. Aberger, Andrew Lamb, Kunle Olukotun, and Christopher Ré. 2017. Mind the Gap: Bridging Multi-Domain Query Workloads with EmptyHeaded. PVLDB 10, 12 (2017), 1849–1852. http://www.vldb.org/pvldb/vol10/p1849-aberger.pdf
[3]
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, and Yoshua Bengio. 2012. Theano: new features and speed improvements. CoRR abs/1211.5590(2012). arxiv:1211.5590http://arxiv.org/abs/1211.5590
[4]
Anant P. Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, and Aditya G. Parameswaran. 2015. DataHub: Collaborative Data Science & Dataset Version Management at Scale. In CIDR 2015, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2015/Papers/CIDR15_Paper18.pdf
[5]
Mike Brachmann, William Spoth, Oliver Kennedy, Boris Glavic, Heiko Mueller, Sonia Castelo, Carlos Bautista, and Juliana Freire. 2020. Your notebook is not crumby enough, REPLace it. In CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2020/papers/p13-brachmann-cidr20.pdf
[6]
Dennis Butterstein and Torsten Grust. 2016. Precision Performance Surgery for PostgreSQL: LLVM-based Expression Compilation, Just in Time. PVLDB 9, 13 (2016), 1517–1520. https://doi.org/10.14778/3007263.3007298
[7]
Alonzo Church. 1936. An Unsolvable Problem of Elementary Number Theory. American Journal of Mathematics 58, 2 (April 1936), 345–363. https://doi.org/10.2307/2371045
[8]
Bin Dong, Patrick Kilian, Xiaocan Li, Fan Guo, Suren Byna, and Kesheng Wu. 2019. Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study. In SSDBM 2019, Santa Cruz, CA, USA, July 23-25, 2019. ACM, 202–205. https://doi.org/10.1145/3335783.3335805
[9]
Christian Duta, Denis Hirn, and Torsten Grust. 2020. Compiling PL/SQL Away. In CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2020/papers/p1-duta-cidr20.pdf
[10]
Ahmed Eldawy, Ibrahim Sabek, Mostafa Elganainy, Ammar Bakeer, Ahmed Abdelmotaleb, and Mohamed F. Mokbel. 2017. Sphinx: Empowering Impala for Efficient Execution of SQL Queries on Big Spatial Data. In SSTD 2017, Arlington, VA, USA, August 21-23, 2017, Proceedings(Lecture Notes in Computer Science), Vol. 10411. Springer, 65–83. https://doi.org/10.1007/978-3-319-64367-0_4
[11]
Maxim Filatov and Verena Kantere. 2016. PAW: A Platform for Analytics Workflows. In EDBT 2016, Bordeaux, France, March 15-16, 2016. OpenProceedings.org, 624–627. https://doi.org/10.5441/002/edbt.2016.64
[12]
Goetz Graefe. 1994. Volcano - An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 120–135. https://doi.org/10.1109/69.273032
[13]
Ali Hadian and Thomas Heinis. 2019. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. In EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 710–713. https://doi.org/10.5441/002/edbt.2019.93
[14]
Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib Analytics Library or MAD Skills, the SQL. PVLDB 5, 12 (2012), 1700–1711. http://vldb.org/pvldb/vol5/p1700_joehellerstein_vldb2012.pdf
[15]
Nina Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, Alfons Kemper, and Thomas Neumann. 2017. HyPerInsight: Data Exploration Deep Inside HyPer. In CIKM 2017, Singapore, November 06 - 10, 2017. 2467–2470. https://doi.org/10.1145/3132847.3133167
[16]
Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory 28, 2 (1982), 129–136. https://doi.org/10.1109/TIT.1982.1056489
[17]
Dmitry Melnik, Ruben Buchatskiy, Roman Zhuykov, and Eugene Sharygin. 2017. JIT-compiling SQL queries in PostgreSQL using LLVM.
[18]
Dimitar Misev and Peter Baumann. 2014. Extending the SQL array concept to support scientific analytics. In SSDBM ’14, Aalborg, Denmark, June 30 - July 02, 2014. ACM, 10:1–10:11. https://doi.org/10.1145/2618243.2618255
[19]
L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the Web. In WWW. Brisbane, Australia, 161–172. citeseer.nj.nec.com/page98pagerank.html
[20]
Linnea Passing, Manuel Then, Nina Hubig, Harald Lang, Michael Schreier, Stephan Günnemann, Alfons Kemper, and Thomas Neumann. 2017. SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases. In EDBT 2017, Venice, Italy, March 21-24, 2017.OpenProceedings.org, 84–95. https://doi.org/10.5441/002/edbt.2017.09
[21]
Daniel Popovic, Edouard Fouché, and Klemens Böhm. 2019. Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data. In ADBIS 2019, Bled, Slovenia, September 8-11, 2019, Proceedings(Lecture Notes in Computer Science), Vol. 11695. Springer, 3–19. https://doi.org/10.1007/978-3-030-28730-6_1
[22]
Maximilian Schüle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. ML2SQL - Compiling a Declarative Machine Learning Language to SQL and Python. In EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 562–565. https://doi.org/10.5441/002/edbt.2019.56
[23]
Maximilian Schüle, Linnea Passing, Alfons Kemper, and Thomas Neumann. 2019. Ja-(zu-)SQL: Evaluation einer SQL-Skriptsprache für Hauptspeicherdatenbanksysteme. In BTW 2019, 4.-8. März 2019, Rostock, Germany, Proceedings. 107–126. https://doi.org/10.18420/btw2019-08
[24]
Maximilian Schüle, Dimitri Vorona, Linnea Passing, Harald Lang, Alfons Kemper, Stephan Günnemann, and Thomas Neumann. 2019. The Power of SQL Lambda Functions. In EDBT 2019, Lisbon, Portugal, March 26-29, 2019. OpenProceedings.org, 534–537. https://doi.org/10.5441/002/edbt.2019.49
[25]
Jun Hyung Shin, Florin Rusu, and Alex Suhan. 2019. Selectivity Computation for In-Memory Query Optimization. In CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2019/gongshow/abstracts/cidr2019_82.pdf
[26]
Michael Stonebraker and Lawrence A. Rowe. 1986. The Design of Postgres. In SIGMOD Conference, Washington, DC, USA, May 28-30, 1986.340–355. https://doi.org/10.1145/16894.16888
[27]
Thanh Truong and Tore Risch. 2015. Transparent inclusion, utilization, and validation of main memory domain indexes. In SSDBM ’15, La Jolla, CA, USA, June 29 - July 1, 2015. ACM, 21:1–21:12. https://doi.org/10.1145/2791347.2791375
[28]
Haoyuan Xing, Sofoklis Floratos, Spyros Blanas, Suren Byna, Prabhat, Kesheng Wu, and Paul Brown. 2018. ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs. In ICDE 2018, Paris, France, April 16-19, 2018. IEEE Computer Society, 977–988. https://doi.org/10.1109/ICDE.2018.00092

Cited By

View all
  • (2024)On Reasoning About Black-Box Udfs by Classifying their Performance CharacteristicsProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.83Online publication date: 2024
  • (2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
  • (2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '20: Proceedings of the 32nd International Conference on Scientific and Statistical Database Management
July 2020
241 pages
ISBN:9781450388146
DOI:10.1145/3400903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Database Operators
  2. Lambda Functions
  3. SQL

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SSDBM 2020

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)On Reasoning About Black-Box Udfs by Classifying their Performance CharacteristicsProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.83Online publication date: 2024
  • (2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
  • (2024)Give a JIT on GPUs: NVRTC for Code-Generating Database Systems2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00061(384-387)Online publication date: 13-May-2024
  • (2024)Higher-Order SQL Lambda Functions2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00450(5622-5628)Online publication date: 13-May-2024
  • (2023)User-Defined Functions in Modern Data Engines2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00276(3593-3598)Online publication date: Apr-2023
  • (2022)BabelfishProceedings of the VLDB Endowment10.14778/3489496.348950115:2(196-210)Online publication date: 4-Feb-2022
  • (2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 1-Sep-2022
  • (2021)Procedural extensions of SQLProceedings of the VLDB Endowment10.14778/3457390.345740214:8(1378-1391)Online publication date: 21-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media