Abstract
This tutorial describes the algorithms and architecture approach to produce high-performance codes for numerically intensive computations. In this approach, for a given computation, we design algorithms so that they perform optimally when run on a target machine-in this case, the new POWER2â„¢machines from the RS/6000 family of RISC processors. The algorithmic features that we emphasize are functional parallelism, cache/register blocking, algorithmic prefetching, and loop unrolling. The architectural features of the POWER2 machine that we describe and that lead to high performance are multiple functional units, high bandwidth between registers, cache, and memory, a large number of fixed- and floating-point registers, and a large cache and TLB (translation lookaside buffer). The paper gives BLAS examples that illustrate how the algorithms and architectural features interplay to produce high performance codes. These routines are included in ESSL (Engineering and Scientific Subroutine Library); an overview of ESSL is also given in the paper.
This paper is a condensation of [3] and is a formal presentation of the concepts presented in the tutorial.
Preview
Unable to display preview. Download preview PDF.
References
IBM RISC System/6000 Processor. IBM Journal of Research and Development, Volume 34, Number 1, 1–136, January 1990.
POWER2 and PowerPC Architecture and Implementation. IBM Journal of Research and Development, Volume 38, Number 5, 489–648, September 1994.
R. C. Agarwal, F. G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development, 38(5):563–576, 1994.
R. C. Agarwal, F. G. Gustavson, and M. Zubair. Improving performance of linear algebra algorithms for dense matrices using prefetch. IBM Journal of Research and Development, 38(3):265–275, 1994.
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK: A portable linear algebra library for high-performance computers. Technical Report Technical Report CS-90-105 (LAPACK Working Note 20), Computer Science Department, University of Tennessee, Knoxville, Tennessee, 1990. Also available online from http://www.netlib.org/lapack/lawns.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. Algorithm 679. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 16(1):18–28, 1990.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. Algorithm 656. An extended set of basic linear algebra subprograms: Model implementation and test programs. ACM Transactions on Mathematical Software, 14(1):18–32, 1988.
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of FORTRAN basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–17, 1988.
IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprogram for Fortran usage. ACM Transactions on Mathematical Software 5(3):308–323, 1979.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agarwal, R.C., Gustavson, F.G., Zubair, M. (1996). Performance tuning on IBM RS/6000 POWER2 systems. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_1
Download citation
DOI: https://doi.org/10.1007/3-540-62095-8_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62095-2
Online ISBN: 978-3-540-49643-4
eBook Packages: Springer Book Archive