research-article

Efficient and Portable Einstein Summation in SQL

Authors:
Mark Blacher

Friedrich Schiller University Jena, Jena, Germany

Friedrich Schiller University Jena, Jena, Germany

0009-0007-2009-7996
View Profile

,
Julien Klaus

Friedrich Schiller University Jena, Jena, Germany

Friedrich Schiller University Jena, Jena, Germany

0000-0002-1498-2653
View Profile

,
Christoph Staudt

Friedrich Schiller University Jena, Jena, Germany

Friedrich Schiller University Jena, Jena, Germany

0009-0000-4250-546X
View Profile

,
Sören Laue

University of Hamburg, Hamburg, Germany

University of Hamburg, Hamburg, Germany

0000-0003-4351-9868
View Profile

,
Viktor Leis

Technical University of Munich, Munich, Germany

Technical University of Munich, Munich, Germany

0000-0001-5676-8017
View Profile

,
Joachim Giesen

Friedrich Schiller University Jena, Jena, Germany

Friedrich Schiller University Jena, Jena, Germany

0000-0001-6598-6833
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 2Article No.: 121pp 1–19https://doi.org/10.1145/3589266

Published:20 June 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Computational problems ranging from artificial intelligence to physics require efficient computations of large tensor expressions. These tensor expressions can often be represented in Einstein notation. To evaluate tensor expressions in Einstein notation, that is, for the actual Einstein summation, usually external libraries are used. Surprisingly, Einstein summation operations on tensors fit well with fundamental SQL constructs. We show that by applying only four mapping rules and a simple decomposition scheme using common table expressions, large tensor expressions in Einstein notation can be translated to portable and efficient SQL code. The ability to execute large Einstein summation queries opens up new possibilities to process data within SQL. We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. The performance of Einstein summation queries, however, depends on the query engine implemented in the database system. Therefore, supporting efficient Einstein summation computations in database systems presents new research challenges for the design and implementation of query engines.

Supplemental Material

PACMMOD-V1mod121.mp4

mp4

50.3 MB

Download

References

D. Marten, H. Meyer, and A. Heuer, "Calculating fourier transforms in SQL," in ADBIS, 2019.Google Scholar
M. E. Schüle, A. Kemper, and T. Neumann, "Recursive sql for data mining," in SSDBM, 2022.Google Scholar
M. Blacher, J. Giesen, S. Laue, J. Klaus, and V. Leis, "Machine learning, linear algebra, and more: Is SQL all you need?," in CIDR, 2022.Google Scholar
T. Fischer, D. Hirn, and T. Grust, "Snakes on a plan: Compiling python functions into plain SQL queries," in SIGMOD, 2022.Google Scholar
M. E. Schüle, F. Simonis, T. Heyenbrock, A. Kemper, S. Günnemann, and T. Neumann, "In-database machine learning: Gradient descent and tensor algebra for main memory database systems," in BTW, 2019.Google Scholar
S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," IEEE Trans. Knowl. Data Eng., 2019.Google ScholarCross Ref
D. Hirn and T. Grust, "One WITH RECURSIVE is worth many GOTOs," in SIGMOD, 2021.Google Scholar
C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, et al., "Array programming with NumPy," Nature, 2020.Google Scholar
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.Google Scholar
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., "Pytorch: An imperative style, high-performance deep learning library," 2019.Google Scholar
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, "JAX: composable transformations of Python+NumPy programs," 2018.Google Scholar
Matthew Rocklin, "Dask: Parallel Computation with Blocked algorithms and Task Scheduling," in Proceedings of the 14th Python in Science Conference, 2015.Google Scholar
R. Nishino and S. H. C. Loomis, "Cupy: A numpy-compatible library for nvidia gpu calculations," Workshop on machine learning systems (LearningSys) in Neural Information Processing Systems (NIPS), 2017.Google Scholar
A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Tentris - A tensor-based triple store," in ISWC, 2020.Google Scholar
J. D. Biamonte, J. Morton, and J. W. Turner, "Tensor network contractions for #sat," Journal of Statistical Physics, 2015.Google ScholarCross Ref
E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," CoRR, vol. abs/1710.01437, 2017.Google Scholar
I. L. Markov and Y. Shi, "Simulating quantum computation by contracting tensor networks," SIAM J. Comput., 2008.Google ScholarDigital Library
A. Einstein, "The foundation of the general theory of relativity," Annalen der Physik, 1916.Google Scholar
O. Bilaniuk, "Einstein summation in numpy." https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/, 2016.Google Scholar
Torch Contributors, "Bilinear." https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html, 2019.Google Scholar
J. Jakes-Schauer, D. Anekstein, and P. Wocjan, "Carving-width and contraction trees for tensor networks," arXiv, 2019.Google Scholar
E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," Information and Inference: A Journal of the IMA, 2019.Google ScholarCross Ref
C. Lam, P. Sadayappan, and R. Wenger, "On optimizing a class of multi-dimensional loops with reductions for parallel execution," Parallel Process. Lett., 1997.Google Scholar
F. Schindler and A. S. Jermyn, "Algorithms for tensor network contraction ordering," Machine Learning: Science and Technology, 2020.Google Scholar
D. Marten, H. Meyer, D. Dietrich, and A. Heuer, "Sparse and dense linear algebra for machine learning on parallel-rdbms using SQL," Open J. Big Data, 2019.Google Scholar
S. Chou, F. Kjolstad, and S. P. Amarasinghe, "Format abstraction for sparse tensor algebra compilers," Proc. ACM Program. Lang., 2018.Google Scholar
D. G. A. Smith and J. Gray, "opt_einsum - A python package for optimizing contraction order for einsum-like expressions," J. Open Source Softw., 2018.Google Scholar
D. G. A. Smith, "opt_einsum docs." https://optimized-einsum.readthedocs.io, 2018.Google Scholar
A. Kemper and T. Neumann, "HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots," in ICDE, 2011.Google Scholar
S. Harris and A. Seaborne, "Sparql 1.1 query language," W3C, 2013.Google Scholar
A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Extended example on Tentris." https://tentris.dice-research.org/iswc2020/, 2020.Google Scholar
A. Addlesee, "Creating linked data." https://medium.com/wallscope/creating-linked-data-31c7dd479a9e. Accessed: 2022-08-04.Google Scholar
R. Griffin, "120 years of olympic history: athletes and results." https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed: 2022-08-04.Google Scholar
S. A. Cook, "The complexity of theorem-proving procedures," in Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, 1971.Google Scholar
L. G. Valiant, "The complexity of computing the permanent," Theor. Comput. Sci., 1979.Google Scholar
"Anaconda software distribution," 2020.Google Scholar
D. Dua and C. Graff, "UCI machine learning repository," 2017.Google Scholar
F. Nussbaum and J. Giesen, "Pairwise sparse + low-rank models for variables of mixed type," J. Multivar. Anal., 2020.Google ScholarCross Ref
E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh, T. Magerlein, E. Solomonik, E. W. Draeger, E. T. Holland, and R. Wisnieff, "Pareto-efficient quantum circuit simulation using tensor contraction deferral," arXiv, 2017.Google Scholar
D. Liakh and USDOE, "Exatensor. computer software," 2019.Google Scholar
B. Villalonga, S. Boixo, B. Nelson, C. Henze, E. Rieffel, R. Biswas, and S. Mandrà, "A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware," npj Quantum Information, 2019.Google Scholar
F. Pan, K. Chen, and P. Zhang, "Solving the sampling problem of the sycamore quantum circuits," Phys. Rev. Lett., 2022.Google ScholarCross Ref
M.-O. Renou, D. Trillo, M. Weilenmann, T. P. Le, A. Tavakoli, N. Gisin, A. Acín, and M. Navascués, "Quantum theory based on real numbers can be experimentally falsified," Nature, 2021.Google Scholar
M.-C. Chen, C. Wang, F.-M. Liu, J.-W. Wang, C. Ying, Z.-X. Shang, Y. Wu, M. Gong, H. Deng, F.-T. Liang, et al., "Ruling out real-valued standard formalism of quantum theory," Physical Review Letters, 2022.Google Scholar
Z.-D. Li, Y.-L. Mao, M. Weilenmann, A. Tavakoli, H. Chen, L. Feng, S.-J. Yang, M.-O. Renou, D. Trillo, T. P. Le, et al., "Testing real quantum theory in an optical quantum network," Physical Review Letters, 2022.Google Scholar
ISO/IEC 9075--2:2016, Database languages -- SQL -- Part 2: Foundation. 2016.Google Scholar
F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al., "Quantum supremacy using a programmable superconducting processor," Nature, 2019.Google Scholar
X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, "Yao.jl: Extensible, Efficient Framework for Quantum Algorithm Design," Quantum, 2020.Google Scholar
M. Raasveldt and H. Mühleisen, "Duckdb: an embeddable analytical database," in SIGMOD, 2019.Google ScholarDigital Library
J. Gray and S. Kourtis, "Hyper-optimized tensor network contraction," Quantum, 2021.Google Scholar
S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz, "k-way hypergraph partitioning via n-level recursive bisection," in ALENEX, 2016.Google Scholar
H. Q. Ngo, C. Ré, and A. Rudra, "Skew strikes back: new developments in the theory of join algorithms," SIGMOD Rec., 2013.Google Scholar
S. Chaudhuri and K. Shim, "Including group-by in query optimization," in VLDB, 1994.Google Scholar
W. P. Yan and P. Larson, "Performing group-by before join," in ICDE, 1994.Google Scholar
M. Eich, P. Fender, and G. Moerkotte, "Efficient generation of query plans containing group-by, join, and groupjoin," VLDB J., 2018.Google ScholarDigital Library
M. Boehm, A. Kumar, and J. Yang, Data Management in Machine Learning Systems. 2019.Google ScholarCross Ref
D. Marten and A. Heuer, "Machine learning on large databases: Transforming hidden markov models to SQL statements," Open J. Databases, 2017.Google Scholar
L. Du, "In-machine-learning database: Reimagining deep learning with old-school SQL," arXiv, 2020.Google Scholar
D. Jankov, S. Luo, B. Yuan, Z. Cai, J. Zou, C. Jermaine, and Z. J. Gao, "Declarative recursive computation on an RDBMS," Proc. VLDB Endow., 2019.Google Scholar
M. E. Schüle, H. Lang, M. Springer, A. Kemper, T. Neumann, and S. Günnemann, "In-database machine learning with SQL on gpus," in SSDBM, 2021.Google Scholar
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas, "The monte carlo database system: Stochastic analysis close to the data," ACM Trans. Database Syst., 2011.Google ScholarDigital Library
Z. Cai, Z. Vagena, L. L. Perez, S. Arumugam, P. J. Haas, and C. M. Jermaine, "Simulation of database-valued markov chains using simsql," in SIGMOD, 2013.Google Scholar
S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," in ICDE, 2017.Google Scholar
J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, "MAD skills: New analysis practices for big data," Proc. VLDB Endow., 2009.Google ScholarDigital Library
J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar, "The madlib analytics library or MAD skills, the SQL," Proc. VLDB Endow., 2012.Google Scholar
X. Feng, A. Kumar, B. Recht, and C. Ré, "Towards a unified architecture for in-rdbms analytics," in SIGMOD, 2012.Google Scholar
Y. Cheng, C. Qin, and F. Rusu, "GLADE: big data analytics made easy," in SIGMOD (K. S. Candan, Y. Chen, R. T. Snodgrass, L. Gravano, and A. Fuxman, eds.), 2012.Google ScholarDigital Library
D. Abadi, A. Ailamaki, D. Andersen, P. Bailis, M. Balazinska, P. A. Bernstein, P. Boncz, S. Chaudhuri, A. Cheung, A. Doan, et al., "The seattle report on database research," Commun. ACM, August 2022.Google Scholar
A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, "Tensorizing neural networks," in Neural Information Processing Systems (NIPS), 2015.Google Scholar
E. M. Stoudenmire and D. J. Schwab, "Supervised learning with tensor networks," in Neural Information Processing Systems (NIPS), 2016.Google Scholar
S. Cheng, L. Wang, T. Xiang, and P. Zhang, "Tree tensor networks for generative modeling," Phys. Rev. B, 2019.Google Scholar
W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and E. M. Stoudenmire, "Towards quantum machine learning with tensor networks," Quantum Science and Technology, 2019.Google Scholar
I. Glasser, N. Pancotti, and J. I. Cirac, "From probabilistic graphical models to generalized tensor networks for supervised learning," IEEE Access, 2020.Google Scholar

Index Terms

Efficient and Portable Einstein Summation in SQL

Recommendations

Comparing NoSQL MongoDB to an SQL DB
ACMSE '13: Proceedings of the 51st ACM Southeast Conference

NoSQL database solutions are becoming more and more prevalent in a world currently dominated by SQL relational databases. NoSQL databases were designed to provide database solutions for large volumes of data that is not structured. However, the ...
Read More
SQL: From Traditional Databases to Big Data
SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science Education

The Structured Query Language (SQL) is the main programing language designed to manage data stored in database systems. While SQL was initially used only with relational database management systems (RDBMS), its use has been significantly extended with ...
Read More
An arbitrary twoqubit computation In 23 elementary gates or less
DAC '03: Proceedings of the 40th annual Design Automation Conference

Quantum circuits currently constitute a dominant model for quantum computation [14]. Our work addresses the problem of constructing quantum circuits to implement an arbitrary given quantum computation, in the special case of two qubits. We pursue ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2023
Published in pacmmod Volume 1, Issue 2

Permissions
Request permissions about this article.
Request Permissions
Author Tags
Einstein summation
SQL
graphical models
model counting
quantum circuits
semantic search
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 413
  Total Downloads
- Downloads (Last 12 months)413
- Downloads (Last 6 weeks)56
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient and Portable Einstein Summation in SQL

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Comparing NoSQL MongoDB to an SQL DB

SQL: From Traditional Databases to Big Data

An arbitrary twoqubit computation In 23 elementary gates or less

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient and Portable Einstein Summation in SQL

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Comparing NoSQL MongoDB to an SQL DB

SQL: From Traditional Databases to Big Data

An arbitrary twoqubit computation In 23 elementary gates or less

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media