Skip to main content

The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers

  • Conference paper
  • First Online:
Applied Parallel Computing Industrial Computation and Optimization (PARA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1184))

Included in the following conference series:

  • 163 Accesses

Abstract

This paper describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, show that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-node SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRF. The algorithm uses novel ideas in the area of distributed dense matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output data layouts from the layout the algorithm uses internally. The algorithm also uses known techniques such as blocking to improve its communication-to-computation ratio and its data-cache behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ramesh Agarwal, Fred Gustavson, Mahesh Joshi, and Mohammad Zubair. A scalable parallel block algorithm for band Cholesky factorization. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, pages 430–435, San Francisco, February 1995.

    Google Scholar 

  2. T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir. SP2 system architecture. IBM Systems Journal, 34(2):152–184, 1995.

    Google Scholar 

  3. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.

    Google Scholar 

  4. Anonymous. ScaLAPACK's user guide. Technical report, University of Tennessee, 1996. Draft.

    Google Scholar 

  5. J. Choi, J. Dongarra, R. Pozo, and D. Walker. ScaLAPACK: A scalable linear algebra for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computation, pages 120–127, 1992. Also available as University of Tennessee Technical Report CS-92-181.

    Google Scholar 

  6. Jack J. Dongarra, Jeremy Du Cruz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.

    Google Scholar 

  7. Anshul Gupta, Fred G. Gustavson, Mahesh Joshi, and Sivan Toledo. The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. Technical Report RC20481, IBM T.J. Watson Research Center, Yorktown Heights, NY, June 1996. Available online from from the IBM Research CyberJournal at http://www. watson.ibm.com:8080.

    Google Scholar 

  8. IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.

    Google Scholar 

  9. Sivan Toledo and Fred G. Gustavson. The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In Proceedings of the 4th Annual Workshop on I/O in Parallel and Distributed Systems, pages 28–40, Philadelphia, May 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, A., Gustavson, F.G., Joshi, M., Toledo, S. (1996). The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-62095-8_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62095-2

  • Online ISBN: 978-3-540-49643-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics