Blocking Techniques in Numerical Software

Gansterer, Wilfried N.; Kvasnicka, Dieter F.; Ueberhuber, Christoph W.

doi:10.1007/3-540-49164-3_13

Wilfried N. Gansterer⁷,
Dieter F. Kvasnicka⁸ &
Christoph W. Ueberhuber⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1557))

Included in the following conference series:

International Conference of the Austrian Center for Parallel Computation

422 Accesses

Abstract

When developing high performance algorithms blocking is a standard procedure to increase the locality of reference. Conflicting factors which influence the choice of blocking parameters are described in this paper. These factors include cache size, load balancing, memory overhead, algorithmic issues, and others. Optimal block sizes can be determined with respect to each of these factors. The resulting block sizes are independent of each other and can be implemented in several levels of blocking within a program. A tridiagonalization algorithm serves as an example to illustrate various blocking techniques.

This work was supported by the Austrian Science Fund ( Osterreichischer Fonds zur Förderung der wissenschaftlichen Forschung).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Why Non-blocking Operations Should be Selfish

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

References

E. Anderson et al., Lapack Users’ Guide, 2nd ed., SIAM Press, Philadelphia, 1995.
Google Scholar
J. Bilmes, K. Asanovic, C.-W. Chin, J. Demmel, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Proceedings of the International Conference on Supercomputing, ACM, Vienna, Austria, 1997, pp. 340–347.
Chapter Google Scholar
J. Bilmes, K. Asanovic, J. Demmel, D. Lam, C.-W. Chin, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Technical report, Lapack Working Note 111, 1996.
Google Scholar
C.H. Bischof, B. Lang, X. Sun, Parallel Tridiagonalization through Two-Step Band Reduction, Proceedings of the Scalable High-Performance Computing Conference, IEEE, Washington D. C., 1994, pp. 23–27.
Chapter Google Scholar
C.H. Bischof, B. Lang, X. Sun, A Framework for Symmetric Band Reduction, Technical report, Argonne Preprint ANL/MCS-P586-0496, 1996.
Google Scholar
C.H. Bischof, B. Lang, X. Sun, The SBR Toolbox-Software for Successive Band Reduction, Technical report, Argonne Preprint ANL/MCS-P587-0496, 1996.
Google Scholar
L. S. Blackford et al., ScaLapack Users’ Guide, SIAM Press, Philadelphia, 1997.
Google Scholar
S. Carr, R.B. Lehoucq, Compiler Blockability of Dense Matrix Factorizations, ACM Trans. Math. Software 23 (1997), pp. 336–361.
Article MATH Google Scholar
E. F. D’Azevedo, J. J. Dongarra, Packed Storage Extension for ScaLapack, Technical report, Lapack Working Note 135, 1998.
Google Scholar
J. J. Dongarra, J. Du Croz, S. Hammarling, I. Duff, A Set of Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 1–17.
Article MATH Google Scholar
J. J. Dongarra, J. Du Croz, S. Hammarling, R. J. Hanson, An Extended Set of Blas, ACM Trans. Math. Software 14 (1988), pp. 18–32.
Article MATH Google Scholar
J. J. Dongarra, I. S. Duff, D.C. Sorensen, H.A. van der Vorst, Linear Algebra and Matrix Theory, SIAM Press, Philadelphia, 1998.
Google Scholar
J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, J. Comput. Appl. Math. 27 (1989), pp. 215–227.
Article MATH MathSciNet Google Scholar
J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, Technical report, Lapack Working Note 2, 1987.
Google Scholar
C. C. Douglas, M. Heroux, G. Slishman, R.M. Smith, GEMMW—A Portable Level 3 Blas Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm, J. Computational Physics 110 (1994), pp. 1–10.
Article MATH MathSciNet Google Scholar
W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Concepts, Technical Report AURORA TR1998-18, Vienna University of Technology, 1998.
Google Scholar
W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Experiments, Technical Report AURORA TR1998-19, Vienna University of Technology, 1998.
Google Scholar
A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing, MIT Press, Cambridge London, 1994.
MATH Google Scholar
G.H. Golub, C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996.
MATH Google Scholar
W. Gropp, E. Lusk, A. Skjelum, Using MPI, MIT Press, Cambridge London, 1994.
Google Scholar
E. Haunschmid, D. F. Kvasnicka, High Performance Computing in Material Sciences. Maximizing Cache Utilization without Increasing Memory Requirements, Technical Report AURORA TR1998-17, Vienna University of Technology, 1998.
Google Scholar
High Performance Fortran Forum, High Performance Fortran Language Specification, Version 2.0, 1997.
Google Scholar
N. J. Higham, Exploiting Fast Matrix Multiplication within the Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 352–368.
Article MATH MathSciNet Google Scholar
N. J. Higham, Stability of a Method for Multiplying Complex Matrices with Three Real Matrix Multiplications, SIAM J. Matrix Anal. Appl. 13(3) (1992), pp. 681–687.
Article MATH MathSciNet Google Scholar
B. Kagstrom, P. Ling, C. Van Loan, GEMM-Based Level 3 Blas: High-Performance Model Implementations and Performance Evaluation Benchmark, ACM Trans. Math. Software 24 (1998).
Google Scholar
D. F. Kvasnicka, Parallel Packed Storage Scheme (P2S2) for Symmetric and Triangular Matrices, Technical Report to appear, Vienna University of Technology, 1998.
Google Scholar
D. F. Kvasnicka, W.N. Gansterer, C.W. Ueberhuber, A Level 3 Algorithm for the Symmetric Eigenproblem, Proceedings of the Third International Meeting on Vector and Parallel Processing (VECPAR’98), Vol. 1, 1998, pp. 267–275.
Google Scholar
J. Laderman, V. Pan, X.-H. Sha, On Practical Acceleration of Matrix Multiplication, Linear Algebra Appl.162-164 (1992), pp. 557–588.
Google Scholar
M. S. Lam, E. E. Rothberg, M. E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, Computer Architecture News 21 (1993), pp. 63–74.
Article Google Scholar
C. L. Lawson, R. J. Hanson, D. Kincaid, F. T. Krogh, Blas for Fortran Usage, ACM Trans. Math. Software 5 (1979), pp. 63–74.
Google Scholar
V. Pan, How Can We Speed Up Matrix Multiplication?, SIAM Rev. 26 (1984), pp. 393–415.
Article MATH MathSciNet Google Scholar
R. Schreiber, J. J. Dongarra, Automatic Blocking of Nested Loops, Technical Report CS-90-108, University of Tennessee, 1990.
Google Scholar
V. Strassen, Gaussian Elimination Is not Optimal, Numer. Math. 13 (1969), pp. 354–356.
Article MATH MathSciNet Google Scholar
C.W. Ueberhuber, Numerical Computation, Springer-Verlag, Heidelberg, 1997.
MATH Google Scholar
R. van de Geijn, Using PLapack: Parallel Linear Algebra Package, MIT Press, 1997.
Google Scholar
R. C. Whaley, J. J. Dongarra, Automatically Tuned Linear Algebra Software, Technical report, Lapack Working Note131, 1997.
Google Scholar
S. Winograd, On Multiplication of 2×2 Matrices, Linear Algebra Appl. 4 (1971), pp. 381–388.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Applied and Numerical Mathematics, University of Technology, Vienna
Wilfried N. Gansterer & Christoph W. Ueberhuber
Institute for Physical and Theoretical Chemistry, University of Technology, Vienna
Dieter F. Kvasnicka

Authors

Wilfried N. Gansterer
View author publications
You can also search for this author in PubMed Google Scholar
Dieter F. Kvasnicka
View author publications
You can also search for this author in PubMed Google Scholar
Christoph W. Ueberhuber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Forschungsinstitut für Softwaretechnologie, Hellbrunnerstr. 34, A-5020, Salzburg, Austria
Peter Zinterhof
Slovac Academy of Sciences, Institute of Mathematics, Laboratory for Informatics, Dubravska 9, P.O.Box 56, 840 00, Bratislava, Slovakia
Marian Vajteršic
Universität Salzburg, RIST++, Hellbrunnerstr. 34, A-5020, Salzburg, Austria
Andreas Uhl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gansterer, W.N., Kvasnicka, D.F., Ueberhuber, C.W. (1999). Blocking Techniques in Numerical Software. In: Zinterhof, P., Vajteršic, M., Uhl, A. (eds) Parallel Computation. ACPC 1999. Lecture Notes in Computer Science, vol 1557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49164-3_13

Download citation

DOI: https://doi.org/10.1007/3-540-49164-3_13
Published: 26 February 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65641-8
Online ISBN: 978-3-540-49164-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Blocking Techniques in Numerical Software

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Why Non-blocking Operations Should be Selfish

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Blocking Techniques in Numerical Software

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures

Why Non-blocking Operations Should be Selfish

Guiding the Optimization of Parallel Codes on Multicores Using an Analytical Cache Model

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation