Compiler-Optimized Kernels: An Efficient Alternative to Hand-Coded Inner Kernels

Herrero, José R.; Navarro, Juan J.

doi:10.1007/11751649_84

Compiler-Optimized Kernels: An Efficient Alternative to Hand-Coded Inner Kernels

José R. Herrero²⁴ &
Juan J. Navarro²⁴

Conference paper

875 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3984))

Abstract

The use of highly optimized inner kernels is of paramount importance for obtaining efficient numerical algorithms. Often, such kernels are created by hand. In this paper, however, we present an alternative way to produce efficient matrix multiplication kernels based on a set of simple codes which can be parameterized at compilation time. Using the resulting kernels we have been able to produce high performance sparse and dense linear algebra codes on a variety of platforms.

This work was supported by the Ministerio de Ciencia y Tecnología of Spain (TIN2004-07739-C02-01).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kamath, C., Ho, R., Manley, D.: DXML: A high-performance scientific subroutine library. Digital Technical Journal 6, 44–56 (1994)
Google Scholar
Navarro, J.J., García, E., Herrero, J.R.: Data prefetching and multilevel blocking for linear algebra operations. In: Proceedings of the 10th international conference on Supercomputing, pp. 109–116. ACM Press, New York (1996)
Chapter Google Scholar
Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Croz, J.D., Hammarling, S., Demmel, J., Bischof, C., Sorensen, D.: LAPACK: A portable linear algebra library for high-performance computers. In: Proc. of Supercomputing 1990, pp. 1–10. IEEE Press, Los Alamitos (1990)
Google Scholar
Kåagström, B., Ling, P., van Loan, C.: Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software (TOMS) 24, 268–302 (1998)
Article Google Scholar
Dongarra, J.J., Du Croz, J., Duff, I.S., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software 16, 1–17 (1990)
Article MATH Google Scholar
Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: 11th ACM Int. Conf. on Supercomputing, pp. 340–347. ACM Press, New York (1997)
Chapter Google Scholar
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Supercomputing 1998, pp. 211–217. IEEE Computer Society, Los Alamitos (1998)
Google Scholar
Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for highperformance computing. ACM Computing Surveys 26, 345–420 (1994)
Article Google Scholar
Herrero, J.R., Navarro, J.J.: Automatic benchmarking and optimization of codes: an experience with numerical kernels. In: Int. Conf. on Software Engineering Research and Practice, pp. 701–706. CSREA Press (2003)
Google Scholar
Intel: Intel(R) Itanium(R) 2 processor reference manual for software development and optimization (2004)
Google Scholar
Fuchs, G., Roy, J., Schrem, E.: Hypermatrix solution of large sets of symmetric positive-definite linear equations. Comp. Meth. Appl. Mech. Eng. 1, 197–216 (1972)
Article MATH Google Scholar
Noor, A., Voigt, S.: Hypermatrix scheme for the STAR–100 computer. Comp. & Struct. 5, 287–296 (1975)
Article Google Scholar
Herrero, J.R., Navarro, J.J.: Improving performance of hypermatrix cholesky factorization. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 461–469. Springer, Heidelberg (2003)
Chapter Google Scholar
Lam, M., Rothberg, E., Wolf, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of ASPLOS 1991, pp. 67–74 (1991)
Google Scholar
Herrero, J.R., Navarro, J.J.: Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme. In: Int. Conf. on Parallel Processing and Applied Mathematics (2005)
Google Scholar
Daydé, M.J., Duff, I.S.: The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors. In: VECPAR, pp. 108–139 (1996)
Google Scholar
SSE2 (Streaming SIMD Extensions 2 for the Pentium 4 processor), http://www.intel.com/software/products/college/ia32/sse2
Gunnels, J.A., Henry, G., van de Geijn, R.A.: A family of high-performance matrix multiplication algorithms. In: International Conference on Computational Science (1), pp. 51–60 (2001)
Google Scholar
Navarro, J.J., Juan, A., Lang, T.: MOB forms: A class of Multilevel Block Algorithms for dense linear algebra operations. In: Proceedings of the 8th International Conference on Supercomputing. ACM Press, New York (1994)
Google Scholar
Goto, K., van de Geijn, R.: On reducing TLB misses in matrix multiplication. Technical Report CS-TR-02-55, Univ. of Texas at Austin (2002)
Google Scholar
Gustavson, F.G.: New generalized data structures for matrices lead to a variety of high performance algorithms. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 418–436. Springer, Heidelberg (2002)
Chapter Google Scholar
Andersen, B.S., Wasniewski, J., Gustavson, F.G.: A recursive formulation of Cholesky factorization of a matrix in packed storage. ACM Transactions on Mathematical Software (TOMS) 27, 214–244 (2001)
Article MATH Google Scholar
Chatterjee, S., Lebeck, A.R., Patnala, P.K., Thottethodi, M.: Recursive array layouts and fast parallel matrix multiplication. In: Proc. of the 11th annual ACM symposium on Parallel algorithms and architectures, pp. 222–231. ACM Press, New York (1999)
Chapter Google Scholar
Valsalam, V., Skjellum, A.: A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels. Concurrency and Computation: Practice and Experience 14, 805–839 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Architecture Dept., Univ. Politècnica de Catalunya, Barcelona, Spain
José R. Herrero & Juan J. Navarro

Authors

José R. Herrero
View author publications
You can also search for this author in PubMed Google Scholar
Juan J. Navarro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary,, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
OptimaNumerics Ltd.,, Cathedral House 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
School of Information and Communication Engineering, Sungkyunkwan University, Korea
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herrero, J.R., Navarro, J.J. (2006). Compiler-Optimized Kernels: An Efficient Alternative to Hand-Coded Inner Kernels. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3984. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751649_84

Download citation

DOI: https://doi.org/10.1007/11751649_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34079-9
Online ISBN: 978-3-540-34080-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics