Skip to main content
Log in

A set of high-performance Level 3 BLAS structured and tuned for the IBM 3090 VF and implemented in Fortran 77

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Fortran 77 implementations of the Level 3 Basic Linear Algebra Subprograms (BLAS) in double precision, structured and tuned to achieve high performance on the IBM 3090 VF, are presented. The implementations are designed to exploit the memory hierarchy and the vector processor efficiently. Efficient cache reuse is provided by a method for matrix blocking adapted to the memory hierarchy. Vector registers and compound vector instructions are used efficiently through carefully designed Fortran code constructs. Performance results generally show speed comparable to the highly tuned IBM ESSL library. In some cases our implementations are actually faster than ESSL. The generality of the program design and the use of Fortran 77 make the implementations portable and well suited to serve as design platforms for other machines with similar architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenny, A., Ostrouchov, S., and Sorensen, D. 1992.LAPACK Users Guide. SIAM Pub., ISBN 0-89871-294-7.

  • Carr, S., and Kennedy, K. 1992. Blocking linear algebra codes for memory hierarchies. InProc., Fourth SIAM Conf. on Parallel Processing for Scientific Computing (Philadelphia), Soc. for Industrial and Applied Math.

    Google Scholar 

  • Cohen, E.I., King, G.M., and Brady, J.T. 1989. Storage hierarchies.IBM Systems J., 28, 1: 62–76.

    Google Scholar 

  • Dongarra, J., and Grosse, E. 1987. Distribution of mathematical software by electronic mail.CACM, 30, 5: 403–407.

    Google Scholar 

  • Dongarra, J.J., DuCroz, J., Duff, I., and Hammarling, S. 1990a. Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms: Model implementation and test programs.ACM Trans. Math. Software, 16, 1 (Mar.): 18–28.

    Google Scholar 

  • Dongarra, J.J., DuCroz, J., Duff, I., and Hammarling, S. 1990b. A set of Level 3 Basic Linear Algebra Subprograms.ACM Trans. Math. Software, 16, 1 (Mar.): 1–17.

    Google Scholar 

  • Grasemann, H. 1989. Optimization of Level 3 BLAS for SIEMENS VP systems. Tech. rept. no. 38.89, Univ. of Karlsruhe, Comp. Center (Sept.).

  • IBM. 1986a.Designing and Writing Fortran Programs for Vector and Parallel Processing. IBM Corp., SC23-0337-25 (Nov.).

  • IBM. 1986b.Vectorization and Vector Migration Techniques. IBM Corp., SR20-4966-0 (June).

  • IBM. 1988a.IBM Enterprise Systems Architecture/370 and System/370 Vector Operations. IBM Corp., SA22-7125-3 (Aug.).

  • IBM. 1988b.VS FORTRAN Version 2 Programming Guide. IBM Corp., SC26-4222-3 (Mar.).

  • IBM. 1990.Engineering and Scientific Subroutine Library Guide and Reference. IBM Corp., SC23-0184-5 (Dec).

  • Kågström, B., and Ling, P. 1989. Level 2 and 3 BLAS routines for IBM 3090 VF: Implementations and experiences. InVector and Parallel Computing (J. Dongarra et al., eds.), Ellis Horwood, pp. 215–228.

  • Kågström, B., and Van Loan, C. 1989. GEMM-based Level-3 BLAS. Tech. rept., Dept. of Comp. Sci., Cornell Univ., Ithaca, N.Y. (Dec).

    Google Scholar 

  • Liu, B., and Strother, N. 1988. Programming in VS Fortran on the IBM 3090 for maximum vector performance.IEEE Comp. (June): 65–76.

  • Schreiber, R., and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rept. CS-90-108, Dept. of Comp. Sci., Univ. of Tenn., Knoxville.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, P. A set of high-performance Level 3 BLAS structured and tuned for the IBM 3090 VF and implemented in Fortran 77. J Supercomput 7, 323–355 (1993). https://doi.org/10.1007/BF01206242

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01206242

Keywords

Navigation