Fast Bilinear Algorithms for Symmetric Tensor Contractions

Edgar Solomonik; James Demmel

doi:10.1515/cmam-2019-0075

Published by De Gruyter February 5, 2020

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Edgar Solomonik and James Demmel

From the journal Computational Methods in Applied Mathematics

https://doi.org/10.1515/cmam-2019-0075

Showing a limited preview of this publication:

Abstract

In matrix-vector multiplication, matrix symmetry does not permit a straightforward reduction in computational cost. More generally, in contractions of symmetric tensors, the symmetries are not preserved in the usual algebraic form of contraction algorithms. We introduce an algorithm that reduces the bilinear complexity (number of computed elementwise products) for most types of symmetric tensor contractions. In particular, it lowers the bilinear complexity of symmetrized contractions of symmetric tensors of order s + v and v + t by a factor of ( s + t + v ) ! s ! ⁢ t ! ⁢ v ! to leading order. The algorithm computes a symmetric tensor of bilinear products, then subtracts unwanted parts of its partial sums. Special cases of this algorithm provide improvements to the bilinear complexity of the multiplication of a symmetric matrix and a vector, the symmetrized vector outer product, and the symmetrized product of symmetric matrices. While the algorithm requires more additions for each elementwise product, the total number of operations is in some cases less than classical algorithms, for tensors of any size. We provide a round-off error analysis of the algorithm and demonstrate that the error is not too large in practice. Finally, we provide an optimized implementation for one variant of the symmetry-preserving algorithm, which achieves speedups of up to 4.58 × for a particular tensor contraction, relative to a classical approach that casts the problem as a matrix-matrix multiplication.

Keywords: Tensor Contractions; Symmetric Matrices; Symmetric Tensors; Bilinear Complexity

MSC 2010: 65Y04; 65Y20; 68Q25; 15B57

Funding source: National Science Foundation

Award Identifier / Grant number: ACI-1548562

Funding statement: This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Via XSEDE, the authors made use of the TACC Stampede2 supercomputer.

Acknowledgements

We would like to thank Devin Matthews, Toru Shiozaki, Hung Woei Neoh, and anonymous reviewers for helpful comments that served to improve this paper.

References

[1] A. A. Albert, On Jordan algebras of linear transformations, Trans. Amer. Math. Soc. 59 (1946), 524–555. 10.1090/S0002-9947-1946-0016759-3Search in Google Scholar

[2] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen, LAPACK Users’ Guide, SIAM, Philadelphia, 1992. Search in Google Scholar

[3] G. Ballard, J. Demmel, O. Holtz, B. Lipshitz and O. Schwartz, Communication-optimal parallel algorithm for Strassen’s matrix multiplication, Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures—SPAA ’12, ACM, New York (2012), 193–204. 10.1145/2312005.2312044Search in Google Scholar

[4] D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput. 9 (1990), no. 3, 251–280. 10.1145/28395.28396Search in Google Scholar

[5] E. Deumens, V. F. Lotrich, A. Perera, M. J. Ponton, B. A. Sanders and R. J. Bartlett, Software design of ACES III with the super instruction architecture, WIREs Comput. Molecular Sci. 1 (2011), no. 6, 895–901. 10.1002/wcms.77Search in Google Scholar

[6] E. Epifanovsky, M. Wormit, T. Kuś, A. Landau, D. Zuev, K. Khistyaev, P. Manohar, I. Kaliman, A. Dreuw and A. I. Krylov, New implementation of high-level correlated methods using a general block-tensor library for high-performance electronic structure calculations, J. Comput. Chem. (2013), 10.1002/jcc.23377. 10.1002/jcc.23377Search in Google Scholar

[7] A. Grüneis, G. H. Booth, M. Marsman, J. Spencer, A. Alavi and G. Kresse, Natural orbitals for wave function based correlated calculations using a plane wave basis set, J. Chem. Theory Comput. 7 (2011), no. 9, 2780–2785. 10.1021/ct200263gSearch in Google Scholar

[8] W. Hackbusch, A sparse matrix arithmetic based on ℋ -matrices. I. Introduction to ℋ -matrices, Computing 62 (1999), no. 2, 89–108. 10.1007/s006070050015Search in Google Scholar

[9] M. Hanrath and A. Engels-Putzka, An efficient matrix-matrix multiplication based antisymmetric tensor contraction engine for general order coupled cluster, J. Chem. Phys. 133 (2010), no. 6, Article ID 064108. 10.1063/1.3467878Search in Google Scholar

[10] M. Head-Gordon, J. A. Pople and M. J. Frisch, MP2 energy evaluation by direct methods, Chem. Phys. Lett. 153 (1988), no. 6, 503–506. 10.1016/0009-2614(88)85250-3Search in Google Scholar

[11] S. Hirata, Tensor Contraction Engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories, J. Phys. Chem. A 107 (2003), no. 46, 9887–9897. 10.1021/jp034596zSearch in Google Scholar

[12] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, Stud. Appl. Math. 6 (1927), no. 1–4, 164–189. 10.1002/sapm192761164Search in Google Scholar

[13] J. Huang, D. A. Matthews and R. A. van de Geijn, Strassen’s algorithm for tensor contraction, SIAM J. Sci. Comput. 40 (2018), no. 3, C305–C326. 10.1137/17M1135578Search in Google Scholar

[14] M. Kállay and P. R. Surján, Higher excitations in coupled-cluster theory, J. Chem. Phys. 115 (2001), no. 7, Article ID 2945. 10.1063/1.1383290Search in Google Scholar

[15] V. Khoromskaia and B. N. Khoromskij, Tensor Numerical Methods in Quantum Chemistry, De Gruyter, Berlin, 2018. 10.1515/9783110365832Search in Google Scholar

[16] B. N. Khoromskij, Tensor Numerical Methods in Scientific Computing, adon Ser. Comput. Appl. Math. 19, De Gruyter, Berlin, 2018. 10.1515/9783110365917Search in Google Scholar

[17] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. 10.1137/07070111XSearch in Google Scholar

[18] C. L. Lawson, R. J. Hanson, D. R. Kincaid and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software (TOMS), 5 (1979), no. 3, 308–323. 10.1145/355841.355847Search in Google Scholar

[19] V. Lotrich, N. Flocke, M. Ponton, B. A. Sanders, E. Deumens, R. J. Bartlett and A. Perera, An infrastructure for scalable and portable parallel programs for computational chemistry, Proceedings of the 23rd International Conference on Supercomputing—ICS ’09, ACM, New York (2009), 523–524. 10.1145/1542275.1542361Search in Google Scholar

[20] D. A. Matthews and J. F. Stanton, Aquarius: Scalability and extensibility by design, Abstracts Papers Amer. Chem. Soc. 248 (2014). Search in Google Scholar

[21] J. Noga and P. Valiron, Improved algorithm for triple-excitation contributions within the coupled cluster approach, Molecular Phys. 103 (2005), no. 15–16, 2123–2130. 10.1080/00268970500131140Search in Google Scholar

[22] R. Orús, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Ann. Physics 349 (2014), 117–158. 10.1016/j.aop.2014.06.013Search in Google Scholar

[23] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317. 10.1137/090752286Search in Google Scholar

[24] V. Pan, How can we speed up matrix multiplication?, SIAM Rev. 26 (1984), no. 3, 393–415. 10.1137/1026076Search in Google Scholar

[25] S. Rajbhandari, A. Nikam, P.-W. Lai, K. Stock, S. Krishnamoorthy and P. Sadayappan, Framework for distributed contractions of tensors with symmetry, preprint (2013), Ohio State University. Search in Google Scholar

[26] M. D. Schatz, T. M. Low, R. A. van de Geijn and T. G. Kolda, Exploiting symmetry in tensors for high performance: multiplication with symmetric tensors, SIAM J. Sci. Comput. 36 (2014), no. 5, C453–C479. 10.1137/130907215Search in Google Scholar

[27] Y. Shao, Advances in methods and algorithms in a modern quantum chemistry program package, Phys. Chem. Chem. Phys. 8 (2006), no. 27, 3172–3191. 10.1039/B517914ASearch in Google Scholar

[28] E. Solomonik, Provably Efficient Algorithms for Numerical Tensor Algebra, PhD thesis, University of California, Berkeley, 2014. Search in Google Scholar

[29] E. Solomonik and J. Demmel, Contracting symmetric tensors using fewer multiplications, Technical report, ETH Zürich, 2015. Search in Google Scholar

[30] E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton and J. Demmel, A massively parallel tensor contraction framework for coupled-cluster computations, J. Parallel Distributed Comput. 74 (2014), no. 12, 3176–3190. 10.21236/ADA614387Search in Google Scholar

[31] V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969), 354–356. 10.1007/BF02165411Search in Google Scholar

[32] V. Strassen, Rank and optimal computation of generic tensors, Linear Algebra Appl. 52/53 (1983), 645–685. 10.1016/0024-3795(83)90041-1Search in Google Scholar

[33] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279–311. 10.1007/BF02289464Search in Google Scholar PubMed

[34] M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma, H. J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T. Windus and W. A. de Jong, NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Comput. Phys. Commun. 181 (2010), no. 9, 1477–1489. 10.1016/j.cpc.2010.04.018Search in Google Scholar

[35] V. V. Williams, Multiplying matrices faster than Coppersmith–Winograd, Proceedings of the 2012 ACM Symposium on Theory of Computing—STOC’12, ACM, New York (2012), 887–898. 10.1145/2213977.2214056Search in Google Scholar

[36] J. Xia, S. Chandrasekaran, M. Gu and X. S. Li, Fast algorithms for hierarchically semiseparable matrices, Numer. Linear Algebra Appl. 17 (2010), no. 6, 953–976. 10.1002/nla.691Search in Google Scholar

[37] K. Ye and L.-H. Lim, Algorithms for structured matrix-vector product of optimal bilinear complexity, 2016 IEEE Information Theory Workshop (ITW), IEEE Press, Piscataway (2016), 310–314. 10.1109/ITW.2016.7606846Search in Google Scholar

[38] K. Ye and L.-H. Lim, Fast structured matrix computations: tensor rank and Cohn–Umans method, Found. Comput. Math. 18 (2018), no. 1, 45–95. 10.1007/s10208-016-9332-xSearch in Google Scholar

Received: 2019-04-28

Revised: 2019-10-02

Accepted: 2020-01-09

Published Online: 2020-02-05

Published in Print: 2021-01-01

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Abstract

Acknowledgements

References

Journal and Issue

Articles in the same Issue