Abstract
We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software development and improving reliability.
This is illustrated by considering the solution of full and sparse linear systems. We describe successive layers of computational kernels such as the BLAS, the sparse BLAS, blocked algorithms for factorizing full systems, direct and iterative methods for sparse linear systems.
We also show how the architecture of the today's powerful RISC processors may influence efficient code design.
Part of this work was funded by Conseil Régional Midi-Pyrénées under project DAE1/RECH/9308020 and by the Alliance Program from the British Council.
Preview
Unable to display preview. Download preview PDF.
References
Amestoy, P. R. (1991), Factorization of large sparse matrices based on a multifrontal approach in a multiprocessor environment, Phd thesis, Institut National Polytechnique de Toulouse. Available as CERFACS report TH/PA/91/2.
Amestoy, P. R. and Duff, I. S. (1989), ‘Vectorization of a multiprocessor multifrontal code', Int. J. of Supercomputer Applics. 3, 41–59.
Amestoy, P. R. and Duff, I. S. (1993), ‘Memory allocation issues in sparse multiprocessor multifrontal methods', Int. J. of Supercomputer Applics. 7, 64–82.
Amestoy, P. R., Daydé, M. J., Duff, I. S. and Morère, P. (1995), ‘Linear algebra calculations on a virtual shared memory computer', Int Journal of High Speed Computing 7, 21–43.
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S. and Sorensen, D. (1992), LAPACK Users' Guide., SIAM.
Bodin, F. and Seznec, A. (1994), Cache organization influence on loop blocking, Technical Report 803, IRISA, Rennes, France.
Bongartz, I., Conn, A. R., Gould, N. I. M. and Toint, P. L. (1993), CUTE: Constrained and Unconstrained Testing Environment, Technical Report TR/PA/93/10, CERFACS, Toulouse, France.
Carney, S., Heroux, M. A. and Li, G. (1993), A proposal for a sparse BLAS toolkit, Technical Report TR/PA/92/90 (Revised), CERFACS, Toulouse, France.
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D. and Whaley, R. C. (1995a), ScaLAPACK: A portable linear algebra library for distributed memory computers — design issues and performance, Technical Report LAPACK Working Note 95, CS-95-283, University of Tennessee.
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D. and Whaley, R. C. (1995b), A proposal for a set of parallel basic linear algebra subprograms, Technical Report LAPACK Working Note 100, CS-95-283, University of Tennessee.
Conn, A. R., Gould, N. I. M. and Toint, P. L. (1992), LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A), number 17 in 'springer Series in Computational Mathematics', Springer Verlag, Heidelberg, Berlin, New York.
Davis, T. A. and Duff, I. S. (1993), An unsymmetric-pattern multifrontal method for sparse LU factorization, Technical Report RAL 93-036, Rutherford Appleton Laboratory.
Daydé, M. J. (1996), A block version of the eskow-schnabel modified cholesky factorization, Technical Report RT/APO/95/8, ENSEEIHT-IRIT.
Daydé, M. J. and Duff, I. S. (1989), ‘Level 3 BLAS in LU factorization on the CRAY-2, ETA-10P and IBM 3090-200/VF', Int. J. of Supercomputer Applics. 3, 40–70.
Daydé, M. J. and Duff, I. S. (1991), ‘Use of level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors, the ALLIANT FX/80, the CRAY-2, and the IBM 3090/VF', Int. J. of Supercomputer Applics. 5, 92–110.
Daydé, M. J. and Duff, I. S. (1996), A block implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT.
Daydé, M. J., Duff, I. S. and Petitet, A. (1994a), ‘A parallel block implementation of Level 3 BLAS kernels for MIMD vector processors', ACM Transactions on Mathematical Software 20, 178–193.
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1994b), On the use of element-by-element preconditioners to solve large scale partially separable optimization problems, Technical report, ENSEEIHT-IRIT, Toulouse, France. RT/APO/94/4, to appear in SIAM Journal on Scientific Computing.
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1995), Solution of structured systems of linear equations using element-by-element preconditioners, in ‘Proceedings 2nd IMACS International Symposium on Iterative Methods in Linear Algebra', pp. 181–190. Also ENSEEIHT-IRIT Technical Report, RT/APO/95/1.
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1996), Preprocessing of sparse unassembled linear systems for efficient solution using element-by-element preconditioners, in L. Bougé, P. Fraigniaud, A. Mignotte and Y. Robert, eds, ‘Proceedings of Euro-Par 96, Lyon', Vol. 2 of Lecture Notes in Computer Science, Vol. 1124, Springer Verlag, Heidelberg, Berlin, New York, pp. 34–43. Also ENSEEIHT-IRIT Technical Report RT/APO/96/2.
Demmel, J. W., Eisenstat, S. C., Gilbert, J. R., Li, X. S. and Liu, J. W. H. (1995), A supernodal approach to sparse partial pivoting, Technical Report UCB//CSD-95-883, Computer Science Division, U. C. Berkeley, Berkeley, California.
Dennis, J. and Schnabel, R. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, N.J.
Dodson, D. S., Grimes, R. G. and Lewis, J. G. (1991), ‘Sparse extensions to the Fortran Basic Linear Algebra Subprograms', ACM Transactions on Mathematical Software 17, 253–263.
Dongarra, J. and Whaley, R. C. (1995), A users' guide to the blacs, Technical Report CS-95-281, University of Tennessee, Knoxville, Tennessee, USA.
Dongarra, J. J. (1992), Performance of various computers using standard linear algebra software, Technical Report CS-89-85, University of Tennessee, Knoxville, Tennessee, USA.
Dongarra, J. J. and Grosse, E. (1987), ‘Distribution of mathematical software via electronic mail', Comm. ACM 30, 403–407.
Dongarra, J. J., Du Croz, J., Duff, I. S. and Hammarling, S. (1990), ‘Algorithm 679. a set of Level 3 Basic Linear Algebra Subprograms.', ACM Transactions on Mathematical Software 16, 1–17.
Dongarra, J. J., Duff, I. S., Sorensen, D. C. and van der Vorst, H. A. (1991a), Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.
Dongarra, J. J., Mayes, P. and Radicati di Brozolo, G. (1991b), Lapack working note 28: The IBM RISC System/6000 and linear algebra operations, Technical Report CS-91-130, University of Tennessee.
Duff, I. S. (1996), Sparse numerical linear algebra: direct methods and preconditioning, Technical Report RAL 96-047, Rutherford Appleton Laboratory. Also CERFACS Report TR-PA-96-xxx.
Duff, I. S. and Reid, J. K. (1983), ‘The multifrontal solution of indefinite sparse symmetric linear systems', ACM Transactions on Mathematical Software 9, 302–325.
Duff, I. S. and Reid, J. K. (1984), ‘The multifrontal solution of unsymmetric sets of linear systems', SIAM Journal on Scientific and Statistical Computing 5, 633–641.
Duff, I. S., Grimes, R. G. and Lewis, J. G. (1992), Users' guide for the Harwell-Boeing sparse matrix collection (Release I), Technical Report RAL 92-086, Rutherford Appleton Laboratory.
Duff, I. S., Marrone, M., Radicati, G. and Vittoli, C. (1995), A set of Level 3 Basic Linear Algebra Subprograms for sparse matrices, Technical Report TR-RAL-95-049, RAL.
Erhel, J., Traynard, A. and Vidrascu, M. (1991), ‘An element-by-element preconditioned conjugate gradient method implemented on a vector computer', Parallel Computing 17, 1051–1065.
Eskow, E. and Schnabel, R. B. (1991a), ‘Algorithm 695: Software for a new modified cholesky factorization', ACM Transactions on Mathematical Software 17, 306–312.
Eskow, E. and Schnabel, R. B. (1991b), ‘A new modified cholesky factorization', SIAM Journal on Scientific and Statistical Computing 11, 1136–1158.
Gallivan, K., Jalby, W. and Meier, U. (1987), ‘The use of blas3 in linear algebra on a parallel processor with a hierarchical memory', SIAM J. Sci. Stat. Comput. 8, 1079–1084. Timely communications.
Gallivan, K., Jalby, W., Meier, U. and Sameh, A. (1988), ‘Impact of hierarchical memory systems on linear algebra algorithm design', Int Journal of Supercomputer Applications 2(1), 12–48.
Gill, P. and Murray, W. (1974), ‘Newton-type methods for unconstrained and linearly constrained optimization', Mathematical Programming 28, 311–350.
Gill, P., Murray, W. and Wright, M. (1981), Practical Optimization, Academic Press, London and New York.
Griewank, A. and Toint, P. L. (1982), On the unconstrained optimization of partially separable functions, in M. J. D. Powell, ed., ‘Nonlinear Optimization', Academic Press, London and New York.
HSL (1996), Harwell Subroutine Library. A Catalogue of Subroutines (Release 12), AEA Technology, Harwell Laboratory, Oxfordshire, England. For information concerning HSL contact: Dr Scott Roberts, AEA Technology, 552 Harwell, Didcot, Oxon OX11 0RA, England (tel: +44-1235-434714, fax: +44-1235-434136, email: Scott.Roberts@aeat.co.uk).
Hughes, T. J. R., Ferencz, R. M. and Hallquits, J. O. (1987), ‘Large-scale vectorized implicit calculations in solid mechanics on a CRAY X-MP/48 utilizing EBE preconditioned conjugate gradients', Computational Methods in Applied Mechanics and Engineering 61, 215–248.
Hughes, T. J. R., Levit, I. and Winget, J. (1983), ‘An element-by-element solution algorithm for problems of structural and solid mechanics', Compututational Methods in Applied Mechanics and Engineering 36, 241–254.
Kågström, B., Ling, P. and Loan, C. V. (1993), Portable high performance GEMM-based Level-3 BLAS, in ‘Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing', SIAM, pp. 339–346.
L'Excellent, J. Y. (1995), Utilisation de préconditionneurs élément-par-élément pour la résolution de problèmes d'optimisation de grande taille, PhD thesis, INPT-ENSEEIHT.
Ortiz, M., Pinsky, P. M. and Taylor, R. L. (1983), ‘Unconditionally stable element-by-element algorithms for dynamic problems', Compututational Methods in Applied Mechanics and Engineering 36, 223–239.
Schlick, T. (1993), ‘Modified Cholesky factorizations for sparse preconditioners', SIAM Journal on Scientific and Statistical Computing 14, 424–445.
Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985), ‘A modular system of algorithms for unconstrained minimization', ACM Transactions on Mathematical Software 11, 419–440.
Wathen, A. J. (1989), ‘An analysis of some element-by-element techniques', Computational Methods in Applied Mechanics and Engineering 74, 271–287.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daydé, M.J., Duff, I.S. (1997). The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors. In: Palma, J.M.L.M., Dongarra, J. (eds) Vector and Parallel Processing — VECPAR'96. VECPAR 1996. Lecture Notes in Computer Science, vol 1215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62828-2_116
Download citation
DOI: https://doi.org/10.1007/3-540-62828-2_116
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62828-6
Online ISBN: 978-3-540-68699-6
eBook Packages: Springer Book Archive