Skip to main content

Performance tuning on IBM RS/6000 POWER2 systems

  • Conference paper
  • First Online:
Applied Parallel Computing Industrial Computation and Optimization (PARA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1184))

Included in the following conference series:

  • 171 Accesses

Abstract

This tutorial describes the algorithms and architecture approach to produce high-performance codes for numerically intensive computations. In this approach, for a given computation, we design algorithms so that they perform optimally when run on a target machine-in this case, the new POWER2â„¢machines from the RS/6000 family of RISC processors. The algorithmic features that we emphasize are functional parallelism, cache/register blocking, algorithmic prefetching, and loop unrolling. The architectural features of the POWER2 machine that we describe and that lead to high performance are multiple functional units, high bandwidth between registers, cache, and memory, a large number of fixed- and floating-point registers, and a large cache and TLB (translation lookaside buffer). The paper gives BLAS examples that illustrate how the algorithms and architectural features interplay to produce high performance codes. These routines are included in ESSL (Engineering and Scientific Subroutine Library); an overview of ESSL is also given in the paper.

This paper is a condensation of [3] and is a formal presentation of the concepts presented in the tutorial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IBM RISC System/6000 Processor. IBM Journal of Research and Development, Volume 34, Number 1, 1–136, January 1990.

    Google Scholar 

  2. POWER2 and PowerPC Architecture and Implementation. IBM Journal of Research and Development, Volume 38, Number 5, 489–648, September 1994.

    Google Scholar 

  3. R. C. Agarwal, F. G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development, 38(5):563–576, 1994.

    Google Scholar 

  4. R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices using prefetch. IBM Journal of Research and Development, 38(3):265–275, 1994.

    Google Scholar 

  5. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.

    Google Scholar 

  6. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. Technical Report Technical Report CS-90-105 (LAPACK Working Note 20), Computer Science Department, University of Tennessee, Knoxville, Tennessee, 1990. Also available online from http://www.netlib.org/lapack/lawns.

    Google Scholar 

  7. Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. Algorithm 679. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 16(1):18–28, 1990.

    Google Scholar 

  8. Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.

    Google Scholar 

  9. Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. Algorithm 656. An extended set of basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 14(1):18–32, 1988.

    Google Scholar 

  10. Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–17, 1988.

    Google Scholar 

  11. IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.

    Google Scholar 

  12. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprogram for Fortran usage. ACM Transactions on Mathematical Software 5(3):308–323, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Agarwal, R.C., Gustavson, F.G., Zubair, M. (1996). Performance tuning on IBM RS/6000 POWER2 systems. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-62095-8_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62095-2

  • Online ISBN: 978-3-540-49643-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics