Skip to main content

The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors

  • Conference paper
  • First Online:
Vector and Parallel Processing — VECPAR'96 (VECPAR 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1215))

Included in the following conference series:

Abstract

We believe that the availability of portable and efficient serial and parallel numerical libraries that can be used as building blocks is extremely important for both simplifying application software development and improving reliability.

This is illustrated by considering the solution of full and sparse linear systems. We describe successive layers of computational kernels such as the BLAS, the sparse BLAS, blocked algorithms for factorizing full systems, direct and iterative methods for sparse linear systems.

We also show how the architecture of the today's powerful RISC processors may influence efficient code design.

Part of this work was funded by Conseil Régional Midi-Pyrénées under project DAE1/RECH/9308020 and by the Alliance Program from the British Council.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Amestoy, P. R. (1991), Factorization of large sparse matrices based on a multifrontal approach in a multiprocessor environment, Phd thesis, Institut National Polytechnique de Toulouse. Available as CERFACS report TH/PA/91/2.

    Google Scholar 

  • Amestoy, P. R. and Duff, I. S. (1989), ‘Vectorization of a multiprocessor multifrontal code', Int. J. of Supercomputer Applics. 3, 41–59.

    Google Scholar 

  • Amestoy, P. R. and Duff, I. S. (1993), ‘Memory allocation issues in sparse multiprocessor multifrontal methods', Int. J. of Supercomputer Applics. 7, 64–82.

    Google Scholar 

  • Amestoy, P. R., Daydé, M. J., Duff, I. S. and Morère, P. (1995), ‘Linear algebra calculations on a virtual shared memory computer', Int Journal of High Speed Computing 7, 21–43.

    Google Scholar 

  • Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S. and Sorensen, D. (1992), LAPACK Users' Guide., SIAM.

    Google Scholar 

  • Bodin, F. and Seznec, A. (1994), Cache organization influence on loop blocking, Technical Report 803, IRISA, Rennes, France.

    Google Scholar 

  • Bongartz, I., Conn, A. R., Gould, N. I. M. and Toint, P. L. (1993), CUTE: Constrained and Unconstrained Testing Environment, Technical Report TR/PA/93/10, CERFACS, Toulouse, France.

    Google Scholar 

  • Carney, S., Heroux, M. A. and Li, G. (1993), A proposal for a sparse BLAS toolkit, Technical Report TR/PA/92/90 (Revised), CERFACS, Toulouse, France.

    Google Scholar 

  • Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D. and Whaley, R. C. (1995a), ScaLAPACK: A portable linear algebra library for distributed memory computers — design issues and performance, Technical Report LAPACK Working Note 95, CS-95-283, University of Tennessee.

    Google Scholar 

  • Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D. and Whaley, R. C. (1995b), A proposal for a set of parallel basic linear algebra subprograms, Technical Report LAPACK Working Note 100, CS-95-283, University of Tennessee.

    Google Scholar 

  • Conn, A. R., Gould, N. I. M. and Toint, P. L. (1992), LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A), number 17 in 'springer Series in Computational Mathematics', Springer Verlag, Heidelberg, Berlin, New York.

    Google Scholar 

  • Davis, T. A. and Duff, I. S. (1993), An unsymmetric-pattern multifrontal method for sparse LU factorization, Technical Report RAL 93-036, Rutherford Appleton Laboratory.

    Google Scholar 

  • Daydé, M. J. (1996), A block version of the eskow-schnabel modified cholesky factorization, Technical Report RT/APO/95/8, ENSEEIHT-IRIT.

    Google Scholar 

  • Daydé, M. J. and Duff, I. S. (1989), ‘Level 3 BLAS in LU factorization on the CRAY-2, ETA-10P and IBM 3090-200/VF', Int. J. of Supercomputer Applics. 3, 40–70.

    Google Scholar 

  • Daydé, M. J. and Duff, I. S. (1991), ‘Use of level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors, the ALLIANT FX/80, the CRAY-2, and the IBM 3090/VF', Int. J. of Supercomputer Applics. 5, 92–110.

    Google Scholar 

  • Daydé, M. J. and Duff, I. S. (1996), A block implementation of level 3 BLAS for RISC processors, Technical Report RT/APO/96/1, ENSEEIHT-IRIT.

    Google Scholar 

  • Daydé, M. J., Duff, I. S. and Petitet, A. (1994a), ‘A parallel block implementation of Level 3 BLAS kernels for MIMD vector processors', ACM Transactions on Mathematical Software 20, 178–193.

    Google Scholar 

  • Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1994b), On the use of element-by-element preconditioners to solve large scale partially separable optimization problems, Technical report, ENSEEIHT-IRIT, Toulouse, France. RT/APO/94/4, to appear in SIAM Journal on Scientific Computing.

    Google Scholar 

  • Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1995), Solution of structured systems of linear equations using element-by-element preconditioners, in ‘Proceedings 2nd IMACS International Symposium on Iterative Methods in Linear Algebra', pp. 181–190. Also ENSEEIHT-IRIT Technical Report, RT/APO/95/1.

    Google Scholar 

  • Daydé, M. J., L'Excellent, J. Y. and Gould, N. I. M. (1996), Preprocessing of sparse unassembled linear systems for efficient solution using element-by-element preconditioners, in L. Bougé, P. Fraigniaud, A. Mignotte and Y. Robert, eds, ‘Proceedings of Euro-Par 96, Lyon', Vol. 2 of Lecture Notes in Computer Science, Vol. 1124, Springer Verlag, Heidelberg, Berlin, New York, pp. 34–43. Also ENSEEIHT-IRIT Technical Report RT/APO/96/2.

    Google Scholar 

  • Demmel, J. W., Eisenstat, S. C., Gilbert, J. R., Li, X. S. and Liu, J. W. H. (1995), A supernodal approach to sparse partial pivoting, Technical Report UCB//CSD-95-883, Computer Science Division, U. C. Berkeley, Berkeley, California.

    Google Scholar 

  • Dennis, J. and Schnabel, R. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Dodson, D. S., Grimes, R. G. and Lewis, J. G. (1991), ‘Sparse extensions to the Fortran Basic Linear Algebra Subprograms', ACM Transactions on Mathematical Software 17, 253–263.

    Google Scholar 

  • Dongarra, J. and Whaley, R. C. (1995), A users' guide to the blacs, Technical Report CS-95-281, University of Tennessee, Knoxville, Tennessee, USA.

    Google Scholar 

  • Dongarra, J. J. (1992), Performance of various computers using standard linear algebra software, Technical Report CS-89-85, University of Tennessee, Knoxville, Tennessee, USA.

    Google Scholar 

  • Dongarra, J. J. and Grosse, E. (1987), ‘Distribution of mathematical software via electronic mail', Comm. ACM 30, 403–407.

    Google Scholar 

  • Dongarra, J. J., Du Croz, J., Duff, I. S. and Hammarling, S. (1990), ‘Algorithm 679. a set of Level 3 Basic Linear Algebra Subprograms.', ACM Transactions on Mathematical Software 16, 1–17.

    Google Scholar 

  • Dongarra, J. J., Duff, I. S., Sorensen, D. C. and van der Vorst, H. A. (1991a), Solving Linear Systems on Vector and Shared Memory Computers, SIAM, Philadelphia.

    Google Scholar 

  • Dongarra, J. J., Mayes, P. and Radicati di Brozolo, G. (1991b), Lapack working note 28: The IBM RISC System/6000 and linear algebra operations, Technical Report CS-91-130, University of Tennessee.

    Google Scholar 

  • Duff, I. S. (1996), Sparse numerical linear algebra: direct methods and preconditioning, Technical Report RAL 96-047, Rutherford Appleton Laboratory. Also CERFACS Report TR-PA-96-xxx.

    Google Scholar 

  • Duff, I. S. and Reid, J. K. (1983), ‘The multifrontal solution of indefinite sparse symmetric linear systems', ACM Transactions on Mathematical Software 9, 302–325.

    Google Scholar 

  • Duff, I. S. and Reid, J. K. (1984), ‘The multifrontal solution of unsymmetric sets of linear systems', SIAM Journal on Scientific and Statistical Computing 5, 633–641.

    Google Scholar 

  • Duff, I. S., Grimes, R. G. and Lewis, J. G. (1992), Users' guide for the Harwell-Boeing sparse matrix collection (Release I), Technical Report RAL 92-086, Rutherford Appleton Laboratory.

    Google Scholar 

  • Duff, I. S., Marrone, M., Radicati, G. and Vittoli, C. (1995), A set of Level 3 Basic Linear Algebra Subprograms for sparse matrices, Technical Report TR-RAL-95-049, RAL.

    Google Scholar 

  • Erhel, J., Traynard, A. and Vidrascu, M. (1991), ‘An element-by-element preconditioned conjugate gradient method implemented on a vector computer', Parallel Computing 17, 1051–1065.

    Google Scholar 

  • Eskow, E. and Schnabel, R. B. (1991a), ‘Algorithm 695: Software for a new modified cholesky factorization', ACM Transactions on Mathematical Software 17, 306–312.

    Google Scholar 

  • Eskow, E. and Schnabel, R. B. (1991b), ‘A new modified cholesky factorization', SIAM Journal on Scientific and Statistical Computing 11, 1136–1158.

    Google Scholar 

  • Gallivan, K., Jalby, W. and Meier, U. (1987), ‘The use of blas3 in linear algebra on a parallel processor with a hierarchical memory', SIAM J. Sci. Stat. Comput. 8, 1079–1084. Timely communications.

    Google Scholar 

  • Gallivan, K., Jalby, W., Meier, U. and Sameh, A. (1988), ‘Impact of hierarchical memory systems on linear algebra algorithm design', Int Journal of Supercomputer Applications 2(1), 12–48.

    Google Scholar 

  • Gill, P. and Murray, W. (1974), ‘Newton-type methods for unconstrained and linearly constrained optimization', Mathematical Programming 28, 311–350.

    Google Scholar 

  • Gill, P., Murray, W. and Wright, M. (1981), Practical Optimization, Academic Press, London and New York.

    Google Scholar 

  • Griewank, A. and Toint, P. L. (1982), On the unconstrained optimization of partially separable functions, in M. J. D. Powell, ed., ‘Nonlinear Optimization', Academic Press, London and New York.

    Google Scholar 

  • HSL (1996), Harwell Subroutine Library. A Catalogue of Subroutines (Release 12), AEA Technology, Harwell Laboratory, Oxfordshire, England. For information concerning HSL contact: Dr Scott Roberts, AEA Technology, 552 Harwell, Didcot, Oxon OX11 0RA, England (tel: +44-1235-434714, fax: +44-1235-434136, email: Scott.Roberts@aeat.co.uk).

    Google Scholar 

  • Hughes, T. J. R., Ferencz, R. M. and Hallquits, J. O. (1987), ‘Large-scale vectorized implicit calculations in solid mechanics on a CRAY X-MP/48 utilizing EBE preconditioned conjugate gradients', Computational Methods in Applied Mechanics and Engineering 61, 215–248.

    Google Scholar 

  • Hughes, T. J. R., Levit, I. and Winget, J. (1983), ‘An element-by-element solution algorithm for problems of structural and solid mechanics', Compututational Methods in Applied Mechanics and Engineering 36, 241–254.

    Google Scholar 

  • Kågström, B., Ling, P. and Loan, C. V. (1993), Portable high performance GEMM-based Level-3 BLAS, in ‘Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing', SIAM, pp. 339–346.

    Google Scholar 

  • L'Excellent, J. Y. (1995), Utilisation de préconditionneurs élément-par-élément pour la résolution de problèmes d'optimisation de grande taille, PhD thesis, INPT-ENSEEIHT.

    Google Scholar 

  • Ortiz, M., Pinsky, P. M. and Taylor, R. L. (1983), ‘Unconditionally stable element-by-element algorithms for dynamic problems', Compututational Methods in Applied Mechanics and Engineering 36, 223–239.

    Google Scholar 

  • Schlick, T. (1993), ‘Modified Cholesky factorizations for sparse preconditioners', SIAM Journal on Scientific and Statistical Computing 14, 424–445.

    Google Scholar 

  • Schnabel, R. B., Koontz, J. E. and Weiss, B. E. (1985), ‘A modular system of algorithms for unconstrained minimization', ACM Transactions on Mathematical Software 11, 419–440.

    Google Scholar 

  • Wathen, A. J. (1989), ‘An analysis of some element-by-element techniques', Computational Methods in Applied Mechanics and Engineering 74, 271–287.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José M. L. M. Palma Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daydé, M.J., Duff, I.S. (1997). The use of computational kernels in full and sparse linear solvers, efficient code design on high-performance RISC processors. In: Palma, J.M.L.M., Dongarra, J. (eds) Vector and Parallel Processing — VECPAR'96. VECPAR 1996. Lecture Notes in Computer Science, vol 1215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62828-2_116

Download citation

  • DOI: https://doi.org/10.1007/3-540-62828-2_116

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62828-6

  • Online ISBN: 978-3-540-68699-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics