Abstract
Fortran 77 implementations of the Level 3 Basic Linear Algebra Subprograms (BLAS) in double precision, structured and tuned to achieve high performance on the IBM 3090 VF, are presented. The implementations are designed to exploit the memory hierarchy and the vector processor efficiently. Efficient cache reuse is provided by a method for matrix blocking adapted to the memory hierarchy. Vector registers and compound vector instructions are used efficiently through carefully designed Fortran code constructs. Performance results generally show speed comparable to the highly tuned IBM ESSL library. In some cases our implementations are actually faster than ESSL. The generality of the program design and the use of Fortran 77 make the implementations portable and well suited to serve as design platforms for other machines with similar architectures.
Similar content being viewed by others
References
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenny, A., Ostrouchov, S., and Sorensen, D. 1992.LAPACK Users Guide. SIAM Pub., ISBN 0-89871-294-7.
Carr, S., and Kennedy, K. 1992. Blocking linear algebra codes for memory hierarchies. InProc., Fourth SIAM Conf. on Parallel Processing for Scientific Computing (Philadelphia), Soc. for Industrial and Applied Math.
Cohen, E.I., King, G.M., and Brady, J.T. 1989. Storage hierarchies.IBM Systems J., 28, 1: 62–76.
Dongarra, J., and Grosse, E. 1987. Distribution of mathematical software by electronic mail.CACM, 30, 5: 403–407.
Dongarra, J.J., DuCroz, J., Duff, I., and Hammarling, S. 1990a. Algorithm 679: A set of Level 3 Basic Linear Algebra Subprograms: Model implementation and test programs.ACM Trans. Math. Software, 16, 1 (Mar.): 18–28.
Dongarra, J.J., DuCroz, J., Duff, I., and Hammarling, S. 1990b. A set of Level 3 Basic Linear Algebra Subprograms.ACM Trans. Math. Software, 16, 1 (Mar.): 1–17.
Grasemann, H. 1989. Optimization of Level 3 BLAS for SIEMENS VP systems. Tech. rept. no. 38.89, Univ. of Karlsruhe, Comp. Center (Sept.).
IBM. 1986a.Designing and Writing Fortran Programs for Vector and Parallel Processing. IBM Corp., SC23-0337-25 (Nov.).
IBM. 1986b.Vectorization and Vector Migration Techniques. IBM Corp., SR20-4966-0 (June).
IBM. 1988a.IBM Enterprise Systems Architecture/370 and System/370 Vector Operations. IBM Corp., SA22-7125-3 (Aug.).
IBM. 1988b.VS FORTRAN Version 2 Programming Guide. IBM Corp., SC26-4222-3 (Mar.).
IBM. 1990.Engineering and Scientific Subroutine Library Guide and Reference. IBM Corp., SC23-0184-5 (Dec).
Kågström, B., and Ling, P. 1989. Level 2 and 3 BLAS routines for IBM 3090 VF: Implementations and experiences. InVector and Parallel Computing (J. Dongarra et al., eds.), Ellis Horwood, pp. 215–228.
Kågström, B., and Van Loan, C. 1989. GEMM-based Level-3 BLAS. Tech. rept., Dept. of Comp. Sci., Cornell Univ., Ithaca, N.Y. (Dec).
Liu, B., and Strother, N. 1988. Programming in VS Fortran on the IBM 3090 for maximum vector performance.IEEE Comp. (June): 65–76.
Schreiber, R., and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rept. CS-90-108, Dept. of Comp. Sci., Univ. of Tenn., Knoxville.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ling, P. A set of high-performance Level 3 BLAS structured and tuned for the IBM 3090 VF and implemented in Fortran 77. J Supercomput 7, 323–355 (1993). https://doi.org/10.1007/BF01206242
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01206242