Performance tuning on IBM RS/6000 POWER2 systems

Agarwal, R. C.; Gustavson, F. G.; Zubair, M.

doi:10.1007/3-540-62095-8_1

R. C. Agarwal¹,
F. G. Gustavson¹ &
M. Zubair¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1184))

Included in the following conference series:

International Workshop on Applied Parallel Computing

171 Accesses

Abstract

This tutorial describes the algorithms and architecture approach to produce high-performance codes for numerically intensive computations. In this approach, for a given computation, we design algorithms so that they perform optimally when run on a target machine-in this case, the new POWER2^™machines from the RS/6000 family of RISC processors. The algorithmic features that we emphasize are functional parallelism, cache/register blocking, algorithmic prefetching, and loop unrolling. The architectural features of the POWER2 machine that we describe and that lead to high performance are multiple functional units, high bandwidth between registers, cache, and memory, a large number of fixed- and floating-point registers, and a large cache and TLB (translation lookaside buffer). The paper gives BLAS examples that illustrate how the algorithms and architectural features interplay to produce high performance codes. These routines are included in ESSL (Engineering and Scientific Subroutine Library); an overview of ESSL is also given in the paper.

This paper is a condensation of [3] and is a formal presentation of the concepts presented in the tutorial.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IBM RISC System/6000 Processor. IBM Journal of Research and Development, Volume 34, Number 1, 1–136, January 1990.
Google Scholar
POWER2 and PowerPC Architecture and Implementation. IBM Journal of Research and Development, Volume 38, Number 5, 489–648, September 1994.
Google Scholar
R. C. Agarwal, F. G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development, 38(5):563–576, 1994.
Google Scholar
R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices using prefetch. IBM Journal of Research and Development, 38(3):265–275, 1994.
Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.
Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. Technical Report Technical Report CS-90-105 (LAPACK Working Note 20), Computer Science Department, University of Tennessee, Knoxville, Tennessee, 1990. Also available online from http://www.netlib.org/lapack/lawns.
Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. Algorithm 679. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 16(1):18–28, 1990.
Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. Algorithm 656. An extended set of basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 14(1):18–32, 1988.
Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–17, 1988.
Google Scholar
IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.
Google Scholar
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprogram for Fortran usage. ACM Transactions on Mathematical Software 5(3):308–323, 1979.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, P.O. Box 218, 10598, Yorktown Heights, NY, USA
R. C. Agarwal, F. G. Gustavson & M. Zubair

Authors

R. C. Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
F. G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
M. Zubair
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agarwal, R.C., Gustavson, F.G., Zubair, M. (1996). Performance tuning on IBM RS/6000 POWER2 systems. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_1

Download citation

DOI: https://doi.org/10.1007/3-540-62095-8_1
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62095-2
Online ISBN: 978-3-540-49643-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics