Abstract
When developing high performance algorithms blocking is a standard procedure to increase the locality of reference. Conflicting factors which influence the choice of blocking parameters are described in this paper. These factors include cache size, load balancing, memory overhead, algorithmic issues, and others. Optimal block sizes can be determined with respect to each of these factors. The resulting block sizes are independent of each other and can be implemented in several levels of blocking within a program. A tridiagonalization algorithm serves as an example to illustrate various blocking techniques.
This work was supported by the Austrian Science Fund ( Osterreichischer Fonds zur Förderung der wissenschaftlichen Forschung).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
E. Anderson et al., Lapack Users’ Guide, 2nd ed., SIAM Press, Philadelphia, 1995.
J. Bilmes, K. Asanovic, C.-W. Chin, J. Demmel, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Proceedings of the International Conference on Supercomputing, ACM, Vienna, Austria, 1997, pp. 340–347.
J. Bilmes, K. Asanovic, J. Demmel, D. Lam, C.-W. Chin, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Technical report, Lapack Working Note 111, 1996.
C.H. Bischof, B. Lang, X. Sun, Parallel Tridiagonalization through Two-Step Band Reduction, Proceedings of the Scalable High-Performance Computing Conference, IEEE, Washington D. C., 1994, pp. 23–27.
C.H. Bischof, B. Lang, X. Sun, A Framework for Symmetric Band Reduction, Technical report, Argonne Preprint ANL/MCS-P586-0496, 1996.
C.H. Bischof, B. Lang, X. Sun, The SBR Toolbox-Software for Successive Band Reduction, Technical report, Argonne Preprint ANL/MCS-P587-0496, 1996.
L. S. Blackford et al., ScaLapack Users’ Guide, SIAM Press, Philadelphia, 1997.
S. Carr, R.B. Lehoucq, Compiler Blockability of Dense Matrix Factorizations, ACM Trans. Math. Software 23 (1997), pp. 336–361.
E. F. D’Azevedo, J. J. Dongarra, Packed Storage Extension for ScaLapack, Technical report, Lapack Working Note 135, 1998.
J. J. Dongarra, J. Du Croz, S. Hammarling, I. Duff, A Set of Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 1–17.
J. J. Dongarra, J. Du Croz, S. Hammarling, R. J. Hanson, An Extended Set of Blas, ACM Trans. Math. Software 14 (1988), pp. 18–32.
J. J. Dongarra, I. S. Duff, D.C. Sorensen, H.A. van der Vorst, Linear Algebra and Matrix Theory, SIAM Press, Philadelphia, 1998.
J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, J. Comput. Appl. Math. 27 (1989), pp. 215–227.
J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, Technical report, Lapack Working Note 2, 1987.
C. C. Douglas, M. Heroux, G. Slishman, R.M. Smith, GEMMW—A Portable Level 3 Blas Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm, J. Computational Physics 110 (1994), pp. 1–10.
W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Concepts, Technical Report AURORA TR1998-18, Vienna University of Technology, 1998.
W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Experiments, Technical Report AURORA TR1998-19, Vienna University of Technology, 1998.
A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing, MIT Press, Cambridge London, 1994.
G.H. Golub, C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996.
W. Gropp, E. Lusk, A. Skjelum, Using MPI, MIT Press, Cambridge London, 1994.
E. Haunschmid, D. F. Kvasnicka, High Performance Computing in Material Sciences. Maximizing Cache Utilization without Increasing Memory Requirements, Technical Report AURORA TR1998-17, Vienna University of Technology, 1998.
High Performance Fortran Forum, High Performance Fortran Language Specification, Version 2.0, 1997.
N. J. Higham, Exploiting Fast Matrix Multiplication within the Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 352–368.
N. J. Higham, Stability of a Method for Multiplying Complex Matrices with Three Real Matrix Multiplications, SIAM J. Matrix Anal. Appl. 13(3) (1992), pp. 681–687.
B. Kagstrom, P. Ling, C. Van Loan, GEMM-Based Level 3 Blas: High-Performance Model Implementations and Performance Evaluation Benchmark, ACM Trans. Math. Software 24 (1998).
D. F. Kvasnicka, Parallel Packed Storage Scheme (P2S2) for Symmetric and Triangular Matrices, Technical Report to appear, Vienna University of Technology, 1998.
D. F. Kvasnicka, W.N. Gansterer, C.W. Ueberhuber, A Level 3 Algorithm for the Symmetric Eigenproblem, Proceedings of the Third International Meeting on Vector and Parallel Processing (VECPAR’98), Vol. 1, 1998, pp. 267–275.
J. Laderman, V. Pan, X.-H. Sha, On Practical Acceleration of Matrix Multiplication, Linear Algebra Appl.162-164 (1992), pp. 557–588.
M. S. Lam, E. E. Rothberg, M. E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, Computer Architecture News 21 (1993), pp. 63–74.
C. L. Lawson, R. J. Hanson, D. Kincaid, F. T. Krogh, Blas for Fortran Usage, ACM Trans. Math. Software 5 (1979), pp. 63–74.
V. Pan, How Can We Speed Up Matrix Multiplication?, SIAM Rev. 26 (1984), pp. 393–415.
R. Schreiber, J. J. Dongarra, Automatic Blocking of Nested Loops, Technical Report CS-90-108, University of Tennessee, 1990.
V. Strassen, Gaussian Elimination Is not Optimal, Numer. Math. 13 (1969), pp. 354–356.
C.W. Ueberhuber, Numerical Computation, Springer-Verlag, Heidelberg, 1997.
R. van de Geijn, Using PLapack: Parallel Linear Algebra Package, MIT Press, 1997.
R. C. Whaley, J. J. Dongarra, Automatically Tuned Linear Algebra Software, Technical report, Lapack Working Note131, 1997.
S. Winograd, On Multiplication of 2×2 Matrices, Linear Algebra Appl. 4 (1971), pp. 381–388.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gansterer, W.N., Kvasnicka, D.F., Ueberhuber, C.W. (1999). Blocking Techniques in Numerical Software. In: Zinterhof, P., Vajteršic, M., Uhl, A. (eds) Parallel Computation. ACPC 1999. Lecture Notes in Computer Science, vol 1557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49164-3_13
Download citation
DOI: https://doi.org/10.1007/3-540-49164-3_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65641-8
Online ISBN: 978-3-540-49164-4
eBook Packages: Springer Book Archive