skip to main content
research-article

Efficient and Portable Einstein Summation in SQL

Published:20 June 2023Publication History
Skip Abstract Section

Abstract

Computational problems ranging from artificial intelligence to physics require efficient computations of large tensor expressions. These tensor expressions can often be represented in Einstein notation. To evaluate tensor expressions in Einstein notation, that is, for the actual Einstein summation, usually external libraries are used. Surprisingly, Einstein summation operations on tensors fit well with fundamental SQL constructs. We show that by applying only four mapping rules and a simple decomposition scheme using common table expressions, large tensor expressions in Einstein notation can be translated to portable and efficient SQL code. The ability to execute large Einstein summation queries opens up new possibilities to process data within SQL. We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. The performance of Einstein summation queries, however, depends on the query engine implemented in the database system. Therefore, supporting efficient Einstein summation computations in database systems presents new research challenges for the design and implementation of query engines.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1mod121.mp4

mp4

50.3 MB

References

  1. D. Marten, H. Meyer, and A. Heuer, "Calculating fourier transforms in SQL," in ADBIS, 2019.Google ScholarGoogle Scholar
  2. M. E. Schüle, A. Kemper, and T. Neumann, "Recursive sql for data mining," in SSDBM, 2022.Google ScholarGoogle Scholar
  3. M. Blacher, J. Giesen, S. Laue, J. Klaus, and V. Leis, "Machine learning, linear algebra, and more: Is SQL all you need?," in CIDR, 2022.Google ScholarGoogle Scholar
  4. T. Fischer, D. Hirn, and T. Grust, "Snakes on a plan: Compiling python functions into plain SQL queries," in SIGMOD, 2022.Google ScholarGoogle Scholar
  5. M. E. Schüle, F. Simonis, T. Heyenbrock, A. Kemper, S. Günnemann, and T. Neumann, "In-database machine learning: Gradient descent and tensor algebra for main memory database systems," in BTW, 2019.Google ScholarGoogle Scholar
  6. S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," IEEE Trans. Knowl. Data Eng., 2019.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Hirn and T. Grust, "One WITH RECURSIVE is worth many GOTOs," in SIGMOD, 2021.Google ScholarGoogle Scholar
  8. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, et al., "Array programming with NumPy," Nature, 2020.Google ScholarGoogle Scholar
  9. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., "TensorFlow: Large-scale machine learning on heterogeneous systems," 2015. Software available from tensorflow.org.Google ScholarGoogle Scholar
  10. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., "Pytorch: An imperative style, high-performance deep learning library," 2019.Google ScholarGoogle Scholar
  11. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, "JAX: composable transformations of Python+NumPy programs," 2018.Google ScholarGoogle Scholar
  12. Matthew Rocklin, "Dask: Parallel Computation with Blocked algorithms and Task Scheduling," in Proceedings of the 14th Python in Science Conference, 2015.Google ScholarGoogle Scholar
  13. R. Nishino and S. H. C. Loomis, "Cupy: A numpy-compatible library for nvidia gpu calculations," Workshop on machine learning systems (LearningSys) in Neural Information Processing Systems (NIPS), 2017.Google ScholarGoogle Scholar
  14. A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Tentris - A tensor-based triple store," in ISWC, 2020.Google ScholarGoogle Scholar
  15. J. D. Biamonte, J. Morton, and J. W. Turner, "Tensor network contractions for #sat," Journal of Statistical Physics, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," CoRR, vol. abs/1710.01437, 2017.Google ScholarGoogle Scholar
  17. I. L. Markov and Y. Shi, "Simulating quantum computation by contracting tensor networks," SIAM J. Comput., 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Einstein, "The foundation of the general theory of relativity," Annalen der Physik, 1916.Google ScholarGoogle Scholar
  19. O. Bilaniuk, "Einstein summation in numpy." https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/, 2016.Google ScholarGoogle Scholar
  20. Torch Contributors, "Bilinear." https://pytorch.org/docs/stable/generated/torch.nn.Bilinear.html, 2019.Google ScholarGoogle Scholar
  21. J. Jakes-Schauer, D. Anekstein, and P. Wocjan, "Carving-width and contraction trees for tensor networks," arXiv, 2019.Google ScholarGoogle Scholar
  22. E. Robeva and A. Seigal, "Duality of graphical models and tensor networks," Information and Inference: A Journal of the IMA, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. Lam, P. Sadayappan, and R. Wenger, "On optimizing a class of multi-dimensional loops with reductions for parallel execution," Parallel Process. Lett., 1997.Google ScholarGoogle Scholar
  24. F. Schindler and A. S. Jermyn, "Algorithms for tensor network contraction ordering," Machine Learning: Science and Technology, 2020.Google ScholarGoogle Scholar
  25. D. Marten, H. Meyer, D. Dietrich, and A. Heuer, "Sparse and dense linear algebra for machine learning on parallel-rdbms using SQL," Open J. Big Data, 2019.Google ScholarGoogle Scholar
  26. S. Chou, F. Kjolstad, and S. P. Amarasinghe, "Format abstraction for sparse tensor algebra compilers," Proc. ACM Program. Lang., 2018.Google ScholarGoogle Scholar
  27. D. G. A. Smith and J. Gray, "opt_einsum - A python package for optimizing contraction order for einsum-like expressions," J. Open Source Softw., 2018.Google ScholarGoogle Scholar
  28. D. G. A. Smith, "opt_einsum docs." https://optimized-einsum.readthedocs.io, 2018.Google ScholarGoogle Scholar
  29. A. Kemper and T. Neumann, "HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots," in ICDE, 2011.Google ScholarGoogle Scholar
  30. S. Harris and A. Seaborne, "Sparql 1.1 query language," W3C, 2013.Google ScholarGoogle Scholar
  31. A. Bigerl, F. Conrads, C. Behning, M. A. Sherif, M. Saleem, and A. N. Ngomo, "Extended example on Tentris." https://tentris.dice-research.org/iswc2020/, 2020.Google ScholarGoogle Scholar
  32. A. Addlesee, "Creating linked data." https://medium.com/wallscope/creating-linked-data-31c7dd479a9e. Accessed: 2022-08-04.Google ScholarGoogle Scholar
  33. R. Griffin, "120 years of olympic history: athletes and results." https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results. Accessed: 2022-08-04.Google ScholarGoogle Scholar
  34. S. A. Cook, "The complexity of theorem-proving procedures," in Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, 1971.Google ScholarGoogle Scholar
  35. L. G. Valiant, "The complexity of computing the permanent," Theor. Comput. Sci., 1979.Google ScholarGoogle Scholar
  36. "Anaconda software distribution," 2020.Google ScholarGoogle Scholar
  37. D. Dua and C. Graff, "UCI machine learning repository," 2017.Google ScholarGoogle Scholar
  38. F. Nussbaum and J. Giesen, "Pairwise sparse + low-rank models for variables of mixed type," J. Multivar. Anal., 2020.Google ScholarGoogle ScholarCross RefCross Ref
  39. E. Pednault, J. A. Gunnels, G. Nannicini, L. Horesh, T. Magerlein, E. Solomonik, E. W. Draeger, E. T. Holland, and R. Wisnieff, "Pareto-efficient quantum circuit simulation using tensor contraction deferral," arXiv, 2017.Google ScholarGoogle Scholar
  40. D. Liakh and USDOE, "Exatensor. computer software," 2019.Google ScholarGoogle Scholar
  41. B. Villalonga, S. Boixo, B. Nelson, C. Henze, E. Rieffel, R. Biswas, and S. Mandrà, "A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware," npj Quantum Information, 2019.Google ScholarGoogle Scholar
  42. F. Pan, K. Chen, and P. Zhang, "Solving the sampling problem of the sycamore quantum circuits," Phys. Rev. Lett., 2022.Google ScholarGoogle ScholarCross RefCross Ref
  43. M.-O. Renou, D. Trillo, M. Weilenmann, T. P. Le, A. Tavakoli, N. Gisin, A. Acín, and M. Navascués, "Quantum theory based on real numbers can be experimentally falsified," Nature, 2021.Google ScholarGoogle Scholar
  44. M.-C. Chen, C. Wang, F.-M. Liu, J.-W. Wang, C. Ying, Z.-X. Shang, Y. Wu, M. Gong, H. Deng, F.-T. Liang, et al., "Ruling out real-valued standard formalism of quantum theory," Physical Review Letters, 2022.Google ScholarGoogle Scholar
  45. Z.-D. Li, Y.-L. Mao, M. Weilenmann, A. Tavakoli, H. Chen, L. Feng, S.-J. Yang, M.-O. Renou, D. Trillo, T. P. Le, et al., "Testing real quantum theory in an optical quantum network," Physical Review Letters, 2022.Google ScholarGoogle Scholar
  46. ISO/IEC 9075--2:2016, Database languages -- SQL -- Part 2: Foundation. 2016.Google ScholarGoogle Scholar
  47. F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al., "Quantum supremacy using a programmable superconducting processor," Nature, 2019.Google ScholarGoogle Scholar
  48. X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, "Yao.jl: Extensible, Efficient Framework for Quantum Algorithm Design," Quantum, 2020.Google ScholarGoogle Scholar
  49. M. Raasveldt and H. Mühleisen, "Duckdb: an embeddable analytical database," in SIGMOD, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Gray and S. Kourtis, "Hyper-optimized tensor network contraction," Quantum, 2021.Google ScholarGoogle Scholar
  51. S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz, "k-way hypergraph partitioning via n-level recursive bisection," in ALENEX, 2016.Google ScholarGoogle Scholar
  52. H. Q. Ngo, C. Ré, and A. Rudra, "Skew strikes back: new developments in the theory of join algorithms," SIGMOD Rec., 2013.Google ScholarGoogle Scholar
  53. S. Chaudhuri and K. Shim, "Including group-by in query optimization," in VLDB, 1994.Google ScholarGoogle Scholar
  54. W. P. Yan and P. Larson, "Performing group-by before join," in ICDE, 1994.Google ScholarGoogle Scholar
  55. M. Eich, P. Fender, and G. Moerkotte, "Efficient generation of query plans containing group-by, join, and groupjoin," VLDB J., 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Boehm, A. Kumar, and J. Yang, Data Management in Machine Learning Systems. 2019.Google ScholarGoogle ScholarCross RefCross Ref
  57. D. Marten and A. Heuer, "Machine learning on large databases: Transforming hidden markov models to SQL statements," Open J. Databases, 2017.Google ScholarGoogle Scholar
  58. L. Du, "In-machine-learning database: Reimagining deep learning with old-school SQL," arXiv, 2020.Google ScholarGoogle Scholar
  59. D. Jankov, S. Luo, B. Yuan, Z. Cai, J. Zou, C. Jermaine, and Z. J. Gao, "Declarative recursive computation on an RDBMS," Proc. VLDB Endow., 2019.Google ScholarGoogle Scholar
  60. M. E. Schüle, H. Lang, M. Springer, A. Kemper, T. Neumann, and S. Günnemann, "In-database machine learning with SQL on gpus," in SSDBM, 2021.Google ScholarGoogle Scholar
  61. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas, "The monte carlo database system: Stochastic analysis close to the data," ACM Trans. Database Syst., 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Z. Cai, Z. Vagena, L. L. Perez, S. Arumugam, P. J. Haas, and C. M. Jermaine, "Simulation of database-valued markov chains using simsql," in SIGMOD, 2013.Google ScholarGoogle Scholar
  63. S. Luo, Z. J. Gao, M. N. Gubanov, L. L. Perez, and C. M. Jermaine, "Scalable linear algebra on a relational database system," in ICDE, 2017.Google ScholarGoogle Scholar
  64. J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton, "MAD skills: New analysis practices for big data," Proc. VLDB Endow., 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar, "The madlib analytics library or MAD skills, the SQL," Proc. VLDB Endow., 2012.Google ScholarGoogle Scholar
  66. X. Feng, A. Kumar, B. Recht, and C. Ré, "Towards a unified architecture for in-rdbms analytics," in SIGMOD, 2012.Google ScholarGoogle Scholar
  67. Y. Cheng, C. Qin, and F. Rusu, "GLADE: big data analytics made easy," in SIGMOD (K. S. Candan, Y. Chen, R. T. Snodgrass, L. Gravano, and A. Fuxman, eds.), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. D. Abadi, A. Ailamaki, D. Andersen, P. Bailis, M. Balazinska, P. A. Bernstein, P. Boncz, S. Chaudhuri, A. Cheung, A. Doan, et al., "The seattle report on database research," Commun. ACM, August 2022.Google ScholarGoogle Scholar
  69. A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov, "Tensorizing neural networks," in Neural Information Processing Systems (NIPS), 2015.Google ScholarGoogle Scholar
  70. E. M. Stoudenmire and D. J. Schwab, "Supervised learning with tensor networks," in Neural Information Processing Systems (NIPS), 2016.Google ScholarGoogle Scholar
  71. S. Cheng, L. Wang, T. Xiang, and P. Zhang, "Tree tensor networks for generative modeling," Phys. Rev. B, 2019.Google ScholarGoogle Scholar
  72. W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and E. M. Stoudenmire, "Towards quantum machine learning with tensor networks," Quantum Science and Technology, 2019.Google ScholarGoogle Scholar
  73. I. Glasser, N. Pancotti, and J. I. Cirac, "From probabilistic graphical models to generalized tensor networks for supervised learning," IEEE Access, 2020.Google ScholarGoogle Scholar

Index Terms

  1. Efficient and Portable Einstein Summation in SQL

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Management of Data
          Proceedings of the ACM on Management of Data  Volume 1, Issue 2
          PACMMOD
          June 2023
          2310 pages
          EISSN:2836-6573
          DOI:10.1145/3605748
          Issue’s Table of Contents

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2023
          Published in pacmmod Volume 1, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)413
          • Downloads (Last 6 weeks)56

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader