Abstract
Tensor–vector multiplication is one of the core components in tensor computations. We have recently investigated high performance, single core implementation of this bandwidth-bound operation. Here, we investigate its efficient, shared-memory implementations. Upon carefully analyzing the design space, we implement a number of alternatives using OpenMP and compare them experimentally. Experimental results on up to 8 socket systems show near peak performance for the proposed algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bader, B.W., Kolda, T.G.: Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM TOMS 32(4), 635–653 (2006)
Ballard, G., Knight, N., Rouse, K.: Communication lower bounds for matricized tensor times Khatri-Rao product. In: 2018 IPDPS, pp. 557–567. IEEE (2018)
Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE Parallel Distrib. Technol. Syst. Appl. 1(3), 12–21 (1993)
Kjolstad, F., Kamil, S., Chou, S., Lugato, D., Amarasinghe, S.: The tensor algebra compiler. Proc. ACM Program. Lang. 1(OOPSLA), 77:1–77:29 (2017)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Li, J., Battaglino, C., Perros, I., Sun, J., Vuduc, R.: An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In: SC 2015, pp. 76:1–76:12 (2015)
Matthews, D.: High-performance tensor contraction without transposition. SIAM J. Sci. Comput. 40(1), C1–C24 (2018)
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)
Pawłowski, F., Uçar, B., Yzelman, A.J.N.: High performance tensor-vector multiples on shared memory systems. Technical report 9274, Inria, Grenoble-Rhône-Alpes (2019)
Pawlowski, F., Uçar, B., Yzelman, A.N.: A multi-dimensional Morton-ordered block storage for mode-oblivious tensor computations. J. Comput. Sci. (2019). https://doi.org/10.1016/j.jocs.2019.02.007
Solomonik, E., Matthews, D., Hammond, J.R., Stanton, J.F., Demmel, J.: A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel Distrib. Comput. 74(12), 3176–3190 (2014)
Springer, P., Bientinesi, P.: Design of a high-performance GEMM-like tensor-tensor multiplication. ACM TOMS 44(3), 1–29 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pawłowski, F., Uçar, B., Yzelman, AJ. (2020). High Performance Tensor–Vector Multiplication on Shared-Memory Systems. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)