Abstract
Computational problems ranging from artificial intelligence to physics require efficient computations of large tensor expressions. These tensor expressions can often be represented in Einstein notation. To evaluate tensor expressions in Einstein notation, that is, for the actual Einstein summation, usually external libraries are used. Surprisingly, Einstein summation operations on tensors fit well with fundamental SQL constructs. We show that by applying only four mapping rules and a simple decomposition scheme using common table expressions, large tensor expressions in Einstein notation can be translated to portable and efficient SQL code. The ability to execute large Einstein summation queries opens up new possibilities to process data within SQL. We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. The performance of Einstein summation queries, however, depends on the query engine implemented in the database system. Therefore, supporting efficient Einstein summation computations in database systems presents new research challenges for the design and implementation of query engines.
Supplemental Material
- D. Marten, H. Meyer, and A. Heuer, "Calculating fourier transforms in SQL," in ADBIS, 2019.Google Scholar
- M. E. Schüle, A. Kemper, and T. Neumann, "Recursive sql for data mining," in SSDBM, 2022.Google Scholar
- M. Blacher, J. Giesen, S. Laue, J. Klaus, and V. Leis, "Machine learning, linear algebra, and more: Is SQL all you need?," in CIDR, 2022.Google Scholar
- T. Fischer, D. Hirn, and T. Grust, "Snakes on a plan: Compiling python functions into plain SQL queries," in SIGMOD, 2022.Google Scholar
- M. E. Schüle, F. Simonis, T. Heyenbrock, A. Kemper, S. Günnemann, and T. Neumann, "In-database machine learning: Gradient descent and tensor algebra for main memory database systems," in BTW, 2019.Google Scholar
- S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," IEEE Trans. Knowl. Data Eng., 2019.Google ScholarCross Ref
- D. Hirn and T. Grust, "One WITH RECURSIVE is worth many GOTOs," in SIGMOD, 2021.Google Scholar
- C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, et al., "Array programming with NumPy," Nature, 2020.Google Scholar
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.Google Scholar
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., "Pytorch: An imperative style, high-performance deep learning library," 2019.Google Scholar
- J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, "JAX: composable transformations of Python+NumPy programs," 2018.Google Scholar
- Matthew Rocklin, "Dask: Parallel Computation with Blocked algorithms and Task Scheduling," in Proceedings of the 14th Python in Science Conference, 2015.Google Scholar
- R. Nishino and S. H. C. Loomis, "Cupy: A numpy-compatible library for nvidia gpu calculations," Workshop on machine learning systems (LearningSys) in Neural Information Processing Systems (NIPS), 2017.Google Scholar
- A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Tentris - A tensor-based triple store," in ISWC, 2020.Google Scholar
- J. D. Biamonte, J. Morton, and J. W. Turner, "Tensor network contractions for #sat," Journal of Statistical Physics, 2015.Google ScholarCross Ref
- E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," CoRR, vol. abs/1710.01437, 2017.Google Scholar
- I. L. Markov and Y. Shi, "Simulating quantum computation by contracting tensor networks," SIAM J. Comput., 2008.Google ScholarDigital Library
- A. Einstein, "The foundation of the general theory of relativity," Annalen der Physik, 1916.Google Scholar
- O. Bilaniuk, "Einstein summation in numpy." https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/, 2016.Google Scholar
- Torch Contributors, "Bilinear." https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html, 2019.Google Scholar
- J. Jakes-Schauer, D. Anekstein, and P. Wocjan, "Carving-width and contraction trees for tensor networks," arXiv, 2019.Google Scholar
- E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," Information and Inference: A Journal of the IMA, 2019.Google ScholarCross Ref
- C. Lam, P. Sadayappan, and R. Wenger, "On optimizing a class of multi-dimensional loops with reductions for parallel execution," Parallel Process. Lett., 1997.Google Scholar
- F. Schindler and A. S. Jermyn, "Algorithms for tensor network contraction ordering," Machine Learning: Science and Technology, 2020.Google Scholar
- D. Marten, H. Meyer, D. Dietrich, and A. Heuer, "Sparse and dense linear algebra for machine learning on parallel-rdbms using SQL," Open J. Big Data, 2019.Google Scholar
- S. Chou, F. Kjolstad, and S. P. Amarasinghe, "Format abstraction for sparse tensor algebra compilers," Proc. ACM Program. Lang., 2018.Google Scholar
- D. G. A. Smith and J. Gray, "opt_einsum - A python package for optimizing contraction order for einsum-like expressions," J. Open Source Softw., 2018.Google Scholar
- D. G. A. Smith, "opt_einsum docs." https://optimized-einsum.readthedocs.io, 2018.Google Scholar
- A. Kemper and T. Neumann, "HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots," in ICDE, 2011.Google Scholar
- S. Harris and A. Seaborne, "Sparql 1.1 query language," W3C, 2013.Google Scholar
- A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Extended example on Tentris." https://tentris.dice-research.org/iswc2020/, 2020.Google Scholar
- A. Addlesee, "Creating linked data." https://medium.com/wallscope/creating-linked-data-31c7dd479a9e. Accessed: 2022-08-04.Google Scholar
- R. Griffin, "120 years of olympic history: athletes and results." https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed: 2022-08-04.Google Scholar
- S. A. Cook, "The complexity of theorem-proving procedures," in Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, 1971.Google Scholar
- L. G. Valiant, "The complexity of computing the permanent," Theor. Comput. Sci., 1979.Google Scholar
- "Anaconda software distribution," 2020.Google Scholar
- D. Dua and C. Graff, "UCI machine learning repository," 2017.Google Scholar
- F. Nussbaum and J. Giesen, "Pairwise sparse + low-rank models for variables of mixed type," J. Multivar. Anal., 2020.Google ScholarCross Ref
- E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh, T. Magerlein, E. Solomonik, E. W. Draeger, E. T. Holland, and R. Wisnieff, "Pareto-efficient quantum circuit simulation using tensor contraction deferral," arXiv, 2017.Google Scholar
- D. Liakh and USDOE, "Exatensor. computer software," 2019.Google Scholar
- B. Villalonga, S. Boixo, B. Nelson, C. Henze, E. Rieffel, R. Biswas, and S. Mandrà, "A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware," npj Quantum Information, 2019.Google Scholar
- F. Pan, K. Chen, and P. Zhang, "Solving the sampling problem of the sycamore quantum circuits," Phys. Rev. Lett., 2022.Google ScholarCross Ref
- M.-O. Renou, D. Trillo, M. Weilenmann, T. P. Le, A. Tavakoli, N. Gisin, A. Acín, and M. Navascués, "Quantum theory based on real numbers can be experimentally falsified," Nature, 2021.Google Scholar
- M.-C. Chen, C. Wang, F.-M. Liu, J.-W. Wang, C. Ying, Z.-X. Shang, Y. Wu, M. Gong, H. Deng, F.-T. Liang, et al., "Ruling out real-valued standard formalism of quantum theory," Physical Review Letters, 2022.Google Scholar
- Z.-D. Li, Y.-L. Mao, M. Weilenmann, A. Tavakoli, H. Chen, L. Feng, S.-J. Yang, M.-O. Renou, D. Trillo, T. P. Le, et al., "Testing real quantum theory in an optical quantum network," Physical Review Letters, 2022.Google Scholar
- ISO/IEC 9075--2:2016, Database languages -- SQL -- Part 2: Foundation. 2016.Google Scholar
- F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al., "Quantum supremacy using a programmable superconducting processor," Nature, 2019.Google Scholar
- X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, "Yao.jl: Extensible, Efficient Framework for Quantum Algorithm Design," Quantum, 2020.Google Scholar
- M. Raasveldt and H. Mühleisen, "Duckdb: an embeddable analytical database," in SIGMOD, 2019.Google ScholarDigital Library
- J. Gray and S. Kourtis, "Hyper-optimized tensor network contraction," Quantum, 2021.Google Scholar
- S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz, "k-way hypergraph partitioning via n-level recursive bisection," in ALENEX, 2016.Google Scholar
- H. Q. Ngo, C. Ré, and A. Rudra, "Skew strikes back: new developments in the theory of join algorithms," SIGMOD Rec., 2013.Google Scholar
- S. Chaudhuri and K. Shim, "Including group-by in query optimization," in VLDB, 1994.Google Scholar
- W. P. Yan and P. Larson, "Performing group-by before join," in ICDE, 1994.Google Scholar
- M. Eich, P. Fender, and G. Moerkotte, "Efficient generation of query plans containing group-by, join, and groupjoin," VLDB J., 2018.Google ScholarDigital Library
- M. Boehm, A. Kumar, and J. Yang, Data Management in Machine Learning Systems. 2019.Google ScholarCross Ref
- D. Marten and A. Heuer, "Machine learning on large databases: Transforming hidden markov models to SQL statements," Open J. Databases, 2017.Google Scholar
- L. Du, "In-machine-learning database: Reimagining deep learning with old-school SQL," arXiv, 2020.Google Scholar
- D. Jankov, S. Luo, B. Yuan, Z. Cai, J. Zou, C. Jermaine, and Z. J. Gao, "Declarative recursive computation on an RDBMS," Proc. VLDB Endow., 2019.Google Scholar
- M. E. Schüle, H. Lang, M. Springer, A. Kemper, T. Neumann, and S. Günnemann, "In-database machine learning with SQL on gpus," in SSDBM, 2021.Google Scholar
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas, "The monte carlo database system: Stochastic analysis close to the data," ACM Trans. Database Syst., 2011.Google ScholarDigital Library
- Z. Cai, Z. Vagena, L. L. Perez, S. Arumugam, P. J. Haas, and C. M. Jermaine, "Simulation of database-valued markov chains using simsql," in SIGMOD, 2013.Google Scholar
- S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," in ICDE, 2017.Google Scholar
- J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, "MAD skills: New analysis practices for big data," Proc. VLDB Endow., 2009.Google ScholarDigital Library
- J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar, "The madlib analytics library or MAD skills, the SQL," Proc. VLDB Endow., 2012.Google Scholar
- X. Feng, A. Kumar, B. Recht, and C. Ré, "Towards a unified architecture for in-rdbms analytics," in SIGMOD, 2012.Google Scholar
- Y. Cheng, C. Qin, and F. Rusu, "GLADE: big data analytics made easy," in SIGMOD (K. S. Candan, Y. Chen, R. T. Snodgrass, L. Gravano, and A. Fuxman, eds.), 2012.Google ScholarDigital Library
- D. Abadi, A. Ailamaki, D. Andersen, P. Bailis, M. Balazinska, P. A. Bernstein, P. Boncz, S. Chaudhuri, A. Cheung, A. Doan, et al., "The seattle report on database research," Commun. ACM, August 2022.Google Scholar
- A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, "Tensorizing neural networks," in Neural Information Processing Systems (NIPS), 2015.Google Scholar
- E. M. Stoudenmire and D. J. Schwab, "Supervised learning with tensor networks," in Neural Information Processing Systems (NIPS), 2016.Google Scholar
- S. Cheng, L. Wang, T. Xiang, and P. Zhang, "Tree tensor networks for generative modeling," Phys. Rev. B, 2019.Google Scholar
- W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and E. M. Stoudenmire, "Towards quantum machine learning with tensor networks," Quantum Science and Technology, 2019.Google Scholar
- I. Glasser, N. Pancotti, and J. I. Cirac, "From probabilistic graphical models to generalized tensor networks for supervised learning," IEEE Access, 2020.Google Scholar
Index Terms
- Efficient and Portable Einstein Summation in SQL
Recommendations
Comparing NoSQL MongoDB to an SQL DB
ACMSE '13: Proceedings of the 51st ACM Southeast ConferenceNoSQL database solutions are becoming more and more prevalent in a world currently dominated by SQL relational databases. NoSQL databases were designed to provide database solutions for large volumes of data that is not structured. However, the ...
SQL: From Traditional Databases to Big Data
SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science EducationThe Structured Query Language (SQL) is the main programing language designed to manage data stored in database systems. While SQL was initially used only with relational database management systems (RDBMS), its use has been significantly extended with ...
An arbitrary twoqubit computation In 23 elementary gates or less
DAC '03: Proceedings of the 40th annual Design Automation ConferenceQuantum circuits currently constitute a dominant model for quantum computation [14]. Our work addresses the problem of constructing quantum circuits to implement an arbitrary given quantum computation, in the special case of two qubits. We pursue ...
Comments