Skip to main content

Blocking Techniques in Numerical Software

  • Conference paper
  • First Online:
Parallel Computation (ACPC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1557))

  • 422 Accesses

Abstract

When developing high performance algorithms blocking is a standard procedure to increase the locality of reference. Conflicting factors which influence the choice of blocking parameters are described in this paper. These factors include cache size, load balancing, memory overhead, algorithmic issues, and others. Optimal block sizes can be determined with respect to each of these factors. The resulting block sizes are independent of each other and can be implemented in several levels of blocking within a program. A tridiagonalization algorithm serves as an example to illustrate various blocking techniques.

This work was supported by the Austrian Science Fund ( Osterreichischer Fonds zur Förderung der wissenschaftlichen Forschung).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. E. Anderson et al., Lapack Users’ Guide, 2nd ed., SIAM Press, Philadelphia, 1995.

    Google Scholar 

  2. J. Bilmes, K. Asanovic, C.-W. Chin, J. Demmel, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Proceedings of the International Conference on Supercomputing, ACM, Vienna, Austria, 1997, pp. 340–347.

    Chapter  Google Scholar 

  3. J. Bilmes, K. Asanovic, J. Demmel, D. Lam, C.-W. Chin, Optimizing Matrix Multiply using PhiPac: a Portable, High-Performance, ANSI C Coding Methodology, Technical report, Lapack Working Note 111, 1996.

    Google Scholar 

  4. C.H. Bischof, B. Lang, X. Sun, Parallel Tridiagonalization through Two-Step Band Reduction, Proceedings of the Scalable High-Performance Computing Conference, IEEE, Washington D. C., 1994, pp. 23–27.

    Chapter  Google Scholar 

  5. C.H. Bischof, B. Lang, X. Sun, A Framework for Symmetric Band Reduction, Technical report, Argonne Preprint ANL/MCS-P586-0496, 1996.

    Google Scholar 

  6. C.H. Bischof, B. Lang, X. Sun, The SBR Toolbox-Software for Successive Band Reduction, Technical report, Argonne Preprint ANL/MCS-P587-0496, 1996.

    Google Scholar 

  7. L. S. Blackford et al., ScaLapack Users’ Guide, SIAM Press, Philadelphia, 1997.

    Google Scholar 

  8. S. Carr, R.B. Lehoucq, Compiler Blockability of Dense Matrix Factorizations, ACM Trans. Math. Software 23 (1997), pp. 336–361.

    Article  MATH  Google Scholar 

  9. E. F. D’Azevedo, J. J. Dongarra, Packed Storage Extension for ScaLapack, Technical report, Lapack Working Note 135, 1998.

    Google Scholar 

  10. J. J. Dongarra, J. Du Croz, S. Hammarling, I. Duff, A Set of Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 1–17.

    Article  MATH  Google Scholar 

  11. J. J. Dongarra, J. Du Croz, S. Hammarling, R. J. Hanson, An Extended Set of Blas, ACM Trans. Math. Software 14 (1988), pp. 18–32.

    Article  MATH  Google Scholar 

  12. J. J. Dongarra, I. S. Duff, D.C. Sorensen, H.A. van der Vorst, Linear Algebra and Matrix Theory, SIAM Press, Philadelphia, 1998.

    Google Scholar 

  13. J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, J. Comput. Appl. Math. 27 (1989), pp. 215–227.

    Article  MATH  MathSciNet  Google Scholar 

  14. J. J. Dongarra, S. J. Hammarling, D. C. Sorensen, Block Reduction of Matrices to Condensed Forms for Eigenvalue Computations, Technical report, Lapack Working Note 2, 1987.

    Google Scholar 

  15. C. C. Douglas, M. Heroux, G. Slishman, R.M. Smith, GEMMW—A Portable Level 3 Blas Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm, J. Computational Physics 110 (1994), pp. 1–10.

    Article  MATH  MathSciNet  Google Scholar 

  16. W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Concepts, Technical Report AURORA TR1998-18, Vienna University of Technology, 1998.

    Google Scholar 

  17. W.N. Gansterer, D. F. Kvasnicka, High Performance Computing in Material Sciences. The Standard Eigenproblem-Experiments, Technical Report AURORA TR1998-19, Vienna University of Technology, 1998.

    Google Scholar 

  18. A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing, MIT Press, Cambridge London, 1994.

    MATH  Google Scholar 

  19. G.H. Golub, C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, 1996.

    MATH  Google Scholar 

  20. W. Gropp, E. Lusk, A. Skjelum, Using MPI, MIT Press, Cambridge London, 1994.

    Google Scholar 

  21. E. Haunschmid, D. F. Kvasnicka, High Performance Computing in Material Sciences. Maximizing Cache Utilization without Increasing Memory Requirements, Technical Report AURORA TR1998-17, Vienna University of Technology, 1998.

    Google Scholar 

  22. High Performance Fortran Forum, High Performance Fortran Language Specification, Version 2.0, 1997.

    Google Scholar 

  23. N. J. Higham, Exploiting Fast Matrix Multiplication within the Level 3 Blas, ACM Trans. Math. Software 16 (1990), pp. 352–368.

    Article  MATH  MathSciNet  Google Scholar 

  24. N. J. Higham, Stability of a Method for Multiplying Complex Matrices with Three Real Matrix Multiplications, SIAM J. Matrix Anal. Appl. 13(3) (1992), pp. 681–687.

    Article  MATH  MathSciNet  Google Scholar 

  25. B. Kagstrom, P. Ling, C. Van Loan, GEMM-Based Level 3 Blas: High-Performance Model Implementations and Performance Evaluation Benchmark, ACM Trans. Math. Software 24 (1998).

    Google Scholar 

  26. D. F. Kvasnicka, Parallel Packed Storage Scheme (P2S2) for Symmetric and Triangular Matrices, Technical Report to appear, Vienna University of Technology, 1998.

    Google Scholar 

  27. D. F. Kvasnicka, W.N. Gansterer, C.W. Ueberhuber, A Level 3 Algorithm for the Symmetric Eigenproblem, Proceedings of the Third International Meeting on Vector and Parallel Processing (VECPAR’98), Vol. 1, 1998, pp. 267–275.

    Google Scholar 

  28. J. Laderman, V. Pan, X.-H. Sha, On Practical Acceleration of Matrix Multiplication, Linear Algebra Appl.162-164 (1992), pp. 557–588.

    Google Scholar 

  29. M. S. Lam, E. E. Rothberg, M. E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms, Computer Architecture News 21 (1993), pp. 63–74.

    Article  Google Scholar 

  30. C. L. Lawson, R. J. Hanson, D. Kincaid, F. T. Krogh, Blas for Fortran Usage, ACM Trans. Math. Software 5 (1979), pp. 63–74.

    Google Scholar 

  31. V. Pan, How Can We Speed Up Matrix Multiplication?, SIAM Rev. 26 (1984), pp. 393–415.

    Article  MATH  MathSciNet  Google Scholar 

  32. R. Schreiber, J. J. Dongarra, Automatic Blocking of Nested Loops, Technical Report CS-90-108, University of Tennessee, 1990.

    Google Scholar 

  33. V. Strassen, Gaussian Elimination Is not Optimal, Numer. Math. 13 (1969), pp. 354–356.

    Article  MATH  MathSciNet  Google Scholar 

  34. C.W. Ueberhuber, Numerical Computation, Springer-Verlag, Heidelberg, 1997.

    MATH  Google Scholar 

  35. R. van de Geijn, Using PLapack: Parallel Linear Algebra Package, MIT Press, 1997.

    Google Scholar 

  36. R. C. Whaley, J. J. Dongarra, Automatically Tuned Linear Algebra Software, Technical report, Lapack Working Note131, 1997.

    Google Scholar 

  37. S. Winograd, On Multiplication of 2×2 Matrices, Linear Algebra Appl. 4 (1971), pp. 381–388.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gansterer, W.N., Kvasnicka, D.F., Ueberhuber, C.W. (1999). Blocking Techniques in Numerical Software. In: Zinterhof, P., Vajteršic, M., Uhl, A. (eds) Parallel Computation. ACPC 1999. Lecture Notes in Computer Science, vol 1557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49164-3_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-49164-3_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65641-8

  • Online ISBN: 978-3-540-49164-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics