The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors

Daydé, Michel J.; Duff, Iain S.

doi:10.1007/3-540-62828-2_116

Michel J. Daydé¹ &
Iain S. Duff^2,3

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1215))

Included in the following conference series:

International Conference on Vector and Parallel Processing

119 Accesses
4 Citations

Abstract

We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software development and improving reliability.

This is illustrated by considering the solution of full and sparse linear systems. We describe successive layers of computational kernels such as the BLAS, the sparse BLAS, blocked algorithms for factorizing full systems, direct and iterative methods for sparse linear systems.

We also show how the architecture of the today's powerful RISC processors may influence efficient code design.

Part of this work was funded by Conseil Régional Midi-Pyrénées under project DAE1/RECH/9308020 and by the Alliance Program from the British Council.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amestoy, P. R. (1991), Factorization of large sparse matrices based on a multifrontal approach in a multiprocessor environment, Phd thesis, Institut National Polytechnique de Toulouse. Available as CERFACS report TH/PA/91/2.
Google Scholar
Amestoy, P. R. and Duff, I. S. (1989), ‘Vectorization of a multiprocessor multifrontal code', Int. J. of Supercomputer Applics. 3, 41–59.
Google Scholar
Amestoy, P. R. and Duff, I. S. (1993), ‘Memory allocation issues in sparse multiprocessor multifrontal methods', Int. J. of Supercomputer Applics. 7, 64–82.
Google Scholar
Amestoy, P. R., Daydé, M. J., Duff, I. S. and Morère, P. (1995), ‘Linear algebra calculations on a virtual shared memory computer', Int Journal of High Speed Computing 7, 21–43.
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S. and Sorensen, D. (1992), LAPACK Users' Guide., SIAM.
Google Scholar
Bodin, F. and Seznec, A. (1994), Cache organization influence on loop blocking, Technical Report 803, IRISA, Rennes, France.
Google Scholar
Bongartz, I., Conn, A. R., Gould, N. I. M. and Toint, P. L. (1993), CUTE: Constrained and Unconstrained Testing Environment, Technical Report TR/PA/93/10, CERFACS, Toulouse, France.
Google Scholar
Carney, S., Heroux, M. A. and Li, G. (1993), A proposal for a sparse BLAS toolkit, Technical Report TR/PA/92/90 (Revised), CERFACS, Toulouse, France.
Google Scholar
Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D. and Whaley, R. C. (1995a), ScaLAPACK: A portable linear algebra library for distributed memory computers — design issues and performance, Technical Report LAPACK Working Note 95, CS-95-283, University of Tennessee.
Google Scholar
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D. and Whaley, R. C. (1995b), A proposal for a set of parallel basic linear algebra subprograms, Technical Report LAPACK Working Note 100, CS-95-283, University of Tennessee.
Google Scholar
Conn, A. R., Gould, N. I. M. and Toint, P. L. (1992), LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A), number 17 in 'springer Series in Computational Mathematics', Springer Verlag, Heidelberg, Berlin, New York.
Google Scholar
Davis, T. A. and Duff, I. S. (1993), An unsymmetric-pattern multifrontal method for sparse LU factorization, Technical Report RAL 93-036, Rutherford Appleton Laboratory.
Google Scholar
Daydé, M. J. (1996), A block version of the eskow-schnabel modified cholesky factorization, Technical Report RT/APO/95/8, ENSEEIHT-IRIT.
Google Scholar
Daydé, M. J. and Duff, I. S. (1989), ‘Level 3 BLAS in LU factorization on the CRAY-2, ETA-10P and IBM 3090-200/VF', Int. J. of Supercomputer Applics. 3, 40–70.
Google Scholar
Daydé, M. J. and Duff, I. S. (1991), ‘Use of level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors, the ALLIANT FX/80, the CRAY-2, and the IBM 3090/VF', Int. J. of Supercomputer Applics. 5, 92–110.
Google Scholar
Daydé, M. J. and Duff, I. S. (1996), A block implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT.
Google Scholar
Daydé, M. J., Duff, I. S. and Petitet, A. (1994a), ‘A parallel block implementation of Level 3 BLAS kernels for MIMD vector processors', ACM Transactions on Mathematical Software 20, 178–193.
Google Scholar
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1994b), On the use of element-by-element preconditioners to solve large scale partially separable optimization problems, Technical report, ENSEEIHT-IRIT, Toulouse, France. RT/APO/94/4, to appear in SIAM Journal on Scientific Computing.
Google Scholar
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1995), Solution of structured systems of linear equations using element-by-element preconditioners, in ‘Proceedings 2nd IMACS International Symposium on Iterative Methods in Linear Algebra', pp. 181–190. Also ENSEEIHT-IRIT Technical Report, RT/APO/95/1.
Google Scholar
Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1996), Preprocessing of sparse unassembled linear systems for efficient solution using element-by-element preconditioners, in L. Bougé, P. Fraigniaud, A. Mignotte and Y. Robert, eds, ‘Proceedings of Euro-Par 96, Lyon', Vol. 2 of Lecture Notes in Computer Science, Vol. 1124, Springer Verlag, Heidelberg, Berlin, New York, pp. 34–43. Also ENSEEIHT-IRIT Technical Report RT/APO/96/2.
Google Scholar
Demmel, J. W., Eisenstat, S. C., Gilbert, J. R., Li, X. S. and Liu, J. W. H. (1995), A supernodal approach to sparse partial pivoting, Technical Report UCB//CSD-95-883, Computer Science Division, U. C. Berkeley, Berkeley, California.
Google Scholar
Dennis, J. and Schnabel, R. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, N.J.
Google Scholar
Dodson, D. S., Grimes, R. G. and Lewis, J. G. (1991), ‘Sparse extensions to the Fortran Basic Linear Algebra Subprograms', ACM Transactions on Mathematical Software 17, 253–263.
Google Scholar
Dongarra, J. and Whaley, R. C. (1995), A users' guide to the blacs, Technical Report CS-95-281, University of Tennessee, Knoxville, Tennessee, USA.
Google Scholar
Dongarra, J. J. (1992), Performance of various computers using standard linear algebra software, Technical Report CS-89-85, University of Tennessee, Knoxville, Tennessee, USA.
Google Scholar
Dongarra, J. J. and Grosse, E. (1987), ‘Distribution of mathematical software via electronic mail', Comm. ACM 30, 403–407.
Google Scholar
Dongarra, J. J., Du Croz, J., Duff, I. S. and Hammarling, S. (1990), ‘Algorithm 679. a set of Level 3 Basic Linear Algebra Subprograms.', ACM Transactions on Mathematical Software 16, 1–17.
Google Scholar
Dongarra, J. J., Duff, I. S., Sorensen, D. C. and van der Vorst, H. A. (1991a), Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.
Google Scholar
Dongarra, J. J., Mayes, P. and Radicati di Brozolo, G. (1991b), Lapack working note 28: The IBM RISC System/6000 and linear algebra operations, Technical Report CS-91-130, University of Tennessee.
Google Scholar
Duff, I. S. (1996), Sparse numerical linear algebra: direct methods and preconditioning, Technical Report RAL 96-047, Rutherford Appleton Laboratory. Also CERFACS Report TR-PA-96-xxx.
Google Scholar
Duff, I. S. and Reid, J. K. (1983), ‘The multifrontal solution of indefinite sparse symmetric linear systems', ACM Transactions on Mathematical Software 9, 302–325.
Google Scholar
Duff, I. S. and Reid, J. K. (1984), ‘The multifrontal solution of unsymmetric sets of linear systems', SIAM Journal on Scientific and Statistical Computing 5, 633–641.
Google Scholar
Duff, I. S., Grimes, R. G. and Lewis, J. G. (1992), Users' guide for the Harwell-Boeing sparse matrix collection (Release I), Technical Report RAL 92-086, Rutherford Appleton Laboratory.
Google Scholar
Duff, I. S., Marrone, M., Radicati, G. and Vittoli, C. (1995), A set of Level 3 Basic Linear Algebra Subprograms for sparse matrices, Technical Report TR-RAL-95-049, RAL.
Google Scholar
Erhel, J., Traynard, A. and Vidrascu, M. (1991), ‘An element-by-element preconditioned conjugate gradient method implemented on a vector computer', Parallel Computing 17, 1051–1065.
Google Scholar
Eskow, E. and Schnabel, R. B. (1991a), ‘Algorithm 695: Software for a new modified cholesky factorization', ACM Transactions on Mathematical Software 17, 306–312.
Google Scholar
Eskow, E. and Schnabel, R. B. (1991b), ‘A new modified cholesky factorization', SIAM Journal on Scientific and Statistical Computing 11, 1136–1158.
Google Scholar
Gallivan, K., Jalby, W. and Meier, U. (1987), ‘The use of blas3 in linear algebra on a parallel processor with a hierarchical memory', SIAM J. Sci. Stat. Comput. 8, 1079–1084. Timely communications.
Google Scholar
Gallivan, K., Jalby, W., Meier, U. and Sameh, A. (1988), ‘Impact of hierarchical memory systems on linear algebra algorithm design', Int Journal of Supercomputer Applications 2(1), 12–48.
Google Scholar
Gill, P. and Murray, W. (1974), ‘Newton-type methods for unconstrained and linearly constrained optimization', Mathematical Programming 28, 311–350.
Google Scholar
Gill, P., Murray, W. and Wright, M. (1981), Practical Optimization, Academic Press, London and New York.
Google Scholar
Griewank, A. and Toint, P. L. (1982), On the unconstrained optimization of partially separable functions, in M. J. D. Powell, ed., ‘Nonlinear Optimization', Academic Press, London and New York.
Google Scholar
HSL (1996), Harwell Subroutine Library. A Catalogue of Subroutines (Release 12), AEA Technology, Harwell Laboratory, Oxfordshire, England. For information concerning HSL contact: Dr Scott Roberts, AEA Technology, 552 Harwell, Didcot, Oxon OX11 0RA, England (tel: +44-1235-434714, fax: +44-1235-434136, email: Scott.Roberts@aeat.co.uk).
Google Scholar
Hughes, T. J. R., Ferencz, R. M. and Hallquits, J. O. (1987), ‘Large-scale vectorized implicit calculations in solid mechanics on a CRAY X-MP/48 utilizing EBE preconditioned conjugate gradients', Computational Methods in Applied Mechanics and Engineering 61, 215–248.
Google Scholar
Hughes, T. J. R., Levit, I. and Winget, J. (1983), ‘An element-by-element solution algorithm for problems of structural and solid mechanics', Compututational Methods in Applied Mechanics and Engineering 36, 241–254.
Google Scholar
Kågström, B., Ling, P. and Loan, C. V. (1993), Portable high performance GEMM-based Level-3 BLAS, in ‘Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing', SIAM, pp. 339–346.
Google Scholar
L'Excellent, J. Y. (1995), Utilisation de préconditionneurs élément-par-élément pour la résolution de problèmes d'optimisation de grande taille, PhD thesis, INPT-ENSEEIHT.
Google Scholar
Ortiz, M., Pinsky, P. M. and Taylor, R. L. (1983), ‘Unconditionally stable element-by-element algorithms for dynamic problems', Compututational Methods in Applied Mechanics and Engineering 36, 223–239.
Google Scholar
Schlick, T. (1993), ‘Modified Cholesky factorizations for sparse preconditioners', SIAM Journal on Scientific and Statistical Computing 14, 424–445.
Google Scholar
Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985), ‘A modular system of algorithms for unconstrained minimization', ACM Transactions on Mathematical Software 11, 419–440.
Google Scholar
Wathen, A. J. (1989), ‘An analysis of some element-by-element techniques', Computational Methods in Applied Mechanics and Engineering 74, 271–287.
Google Scholar

Download references

Author information

Authors and Affiliations

ENSEEIHT-IRIT, 2 rue Camichel, 31071, Toulouse Cedex, France
Michel J. Daydé
Rutherford Appleton Laboratory, OX11 0QX, Oxfordshire, England
Iain S. Duff
CERFACS, 42 av. G. Coriolis, 31057, Toulouse Cedex, France
Iain S. Duff

Authors

Michel J. Daydé
View author publications
You can also search for this author in PubMed Google Scholar
Iain S. Duff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

José M. L. M. Palma Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daydé, M.J., Duff, I.S. (1997). The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors. In: Palma, J.M.L.M., Dongarra, J. (eds) Vector and Parallel Processing — VECPAR'96. VECPAR 1996. Lecture Notes in Computer Science, vol 1215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62828-2_116

Download citation

DOI: https://doi.org/10.1007/3-540-62828-2_116
Published: 05 August 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62828-6
Online ISBN: 978-3-540-68699-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics