ABSTRACT
Many problems in science and engineering are in practice modeled and solved through matrix computations. Often, the matrices involved have structure such as symmetric or triangular, which reduces the operations count needed to perform the computation. For example, dense linear systems of equations are solved by first converting to triangular form and optimization problems may yield matrices with any kind of structure. The well-known BLAS (basic linear algebra subroutine) interface provides a small set of structured matrix computations, chosen to serve a certain set of higher level functions (LAPACK). However, if a user encounters a computation or structure that is not supported, she loses the benefits of the structure and chooses a generic library. In this paper, we address this problem by providing a compiler that translates a given basic linear algebra computation on structured matrices into optimized C code, optionally vectorized with intrinsics. Our work combines prior work on the Spiral-like LGen compiler with techniques from polyhedral compilation to mathematically capture matrix structures. In the paper we consider upper/lower triangular and symmetric matrices but the approach is extensible to a much larger set including blocked structures. We run experiments on a modern Intel platform against the Intel MKL library and a baseline implementation showing competitive performance results for both BLAS and non-BLAS functionalities.
Supplemental Material
Available for Download
The auxiliary material contains two files: ae.pdf (guideline to install and run the artifact) and lgen-ae.tar.gz (the artifact's compressed folder).
- LGen: A basic linear algebra compiler. Available at http: //spiral.net/software/lgen.html.Google Scholar
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, third edition, 1999. Google ScholarDigital Library
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In Parallel Architectures and Compilation Techniques (PACT), pages 7–16, 2004. Google ScholarDigital Library
- U. Beaugnon, A. Kravets, S. van Haastregt, R. Baghdadi, D. Tweed, J. Absar, and A. Lokhmotov. VOBLA: A vehicle for optimized basic linear algebra. In Languages, Compilers and Tools for Embedded Systems (LCTES), pages 115–124, 2014. Google ScholarDigital Library
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Programming Language Design and Implementation (PLDI), pages 101–113, 2008. Google ScholarDigital Library
- J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software (TOMS), 16(1):1–17, 1990. Google ScholarDigital Library
- D. Fabregat-Traver and P. Bientinesi. A domain-specific compiler for linear algebra operations. In High Performance Computing for Computational Science (VECPAR 2012), volume 7851 of Lecture Notes in Computer Science (LNCS), pages 346–361. Springer, 2013.Google ScholarCross Ref
- P. Feautrier and C. Lengauer. Encyclopedia of Parallel Computing, chapter Polyhedron Model. Springer, 2011.Google Scholar
- F. Franchetti, F. Mesmay, D. Mcfarlin, and M. Püschel. Operator language: A program generation framework for fast kernels. In IFIP Working Conference on Domain-Specific Languages (DSL WC), volume 5658 of Lecture Notes in Computer Science (LNCS), pages 385–410. Springer, 2009. Google ScholarDigital Library
- K. Goto and R. A. van de Geijn. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software (TOMS), 34(3):12:1–12:25, 2008. Google ScholarDigital Library
- K. Goto and R. A. van de Geijn. High-performance implementation of the level-3 BLAS. ACM Transactions on Mathematical Software (TOMS), 35(1):4:1–4:14, 2008. Google ScholarDigital Library
- T. Grosser, A. Groesslinger, and C. Lengauer. Polly — performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04):1250010, 2012.Google ScholarCross Ref
- G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen. tuxfamily.org.Google Scholar
- J. A. Gunnels, F. G. Gustavson, G. Henry, and R. A. van de Geijn. FLAME: Formal linear algebra methods environment. ACM Transactions on Mathematical Software (TOMS), 27(4): 422–455, 2001. Google ScholarDigital Library
- Intel math kernel library (MKL). http://software.intel. com/en-us/intel-mkl.Google Scholar
- D. Kim, L. Renganarayanan, D. Rostron, S. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In Supercomputing (SC), pages 1–12, 2007. Google ScholarDigital Library
- N. Kyrtatas, D. G. Spampinato, and M. Püschel. A basic linear algebra compiler for embedded processors. In Design, Automation and Test in Europe (DATE), pages 1054–1059, 2015. Google ScholarDigital Library
- B. Marker, J. Poulson, D. Batory, and R. van de Geijn. Designing linear algebra algorithms by transformation: Mechanizing the expert developer. In High Performance Computing for Computational Science (VECPAR 2012), volume 7851 of Lecture Notes in Computer Science (LNCS), pages 362–378. Springer, 2013.Google ScholarCross Ref
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93 (2):232–275, 2005.Google ScholarCross Ref
- M. Püschel, F. Franchetti, and Y. Voronenko. Encyclopedia of Parallel Computing, chapter Spiral. Springer, 2011.Google ScholarDigital Library
- D. G. Spampinato and M. Püschel. A basic linear algebra compiler. In International Symposium on Code Generation and Optimization (CGO), pages 23–32, 2014. Google ScholarDigital Library
- F. G. Van Zee and R. A. van de Geijn. BLIS: A framework for rapidly instantiating blas functionality. ACM Transactions on Mathematical Software (TOMS), 41(3):14:1–14:33, 2015. Google ScholarDigital Library
- F. G. Van Zee, E. Chan, R. A. van de Geijn, E. S. Quintana-Orti, and G. Quintana-Orti. The libFLAME library for dense matrix computations. IEEE Design & Test, 11(6):56––63, Nov. 2009. Google ScholarDigital Library
- R. Veras and F. Franchetti. Capturing the expert: Generating fast matrix-multiply kernels with Spiral. In High Performance Computing for Computational Science (VECPAR 2014), volume 8969 of Lecture Notes in Computer Science (LNCS), pages 236–244. Springer, 2015.Google ScholarCross Ref
- S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software (MS), volume 6327 of Lecture Notes in Computer Science (LNCS), pages 299–302. Springer, 2010. Google ScholarDigital Library
- K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate highperformance BLAS? Proceedings of the IEEE, 93(2):358–386, 2005.Google ScholarCross Ref
Index Terms
A basic linear algebra compiler for structured matrices
Recommendations
A Basic Linear Algebra Compiler
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationMany applications in media processing, control, graphics, and other domains require efficient small-scale linear algebra computations. However, most existing high performance libraries for linear algebra, such as ATLAS or Intel MKL are more geared ...
A Basic Linear Algebra Compiler
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationMany applications in media processing, control, graphics, and other domains require efficient small-scale linear algebra computations. However, most existing high performance libraries for linear algebra, such as ATLAS or Intel MKL are more geared ...
Stability Issues in the Factorization of Structured Matrices
This paper provides an error analysis of the generalized Schur algorithm of Kailath and Chun [SIAM J. Matrix Anal. Appl., 15 (1994), pp. 114--128]---a class of algorithms which can be used to factorize Toeplitz-like matrices, including block-Toeplitz ...
Comments