The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers

Gupta, Anshul; Gustavson, Fred G.; Joshi, Mahesh; Toledo, Sivan

doi:10.1007/3-540-62095-8_35

Anshul Gupta¹,
Fred G. Gustavson¹,
Mahesh Joshi² &
…
Sivan Toledo¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1184))

Included in the following conference series:

International Workshop on Applied Parallel Computing

163 Accesses

Abstract

This paper describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, show that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-node SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRF. The algorithm uses novel ideas in the area of distributed dense matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output data layouts from the layout the algorithm uses internally. The algorithm also uses known techniques such as blocking to improve its communication-to-computation ratio and its data-cache behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ramesh Agarwal, Fred Gustavson, Mahesh Joshi, and Mohammad Zubair. A scalable parallel block algorithm for band Cholesky factorization. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, pages 430–435, San Francisco, February 1995.
Google Scholar
T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir. SP2 system architecture. IBM Systems Journal, 34(2):152–184, 1995.
Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK User's Guide. SIAM, Philadelphia, PA, 2nd edition, 1994. Also available online from http://www.netlib.org.
Google Scholar
Anonymous. ScaLAPACK's user guide. Technical report, University of Tennessee, 1996. Draft.
Google Scholar
J. Choi, J. Dongarra, R. Pozo, and D. Walker. ScaLAPACK: A scalable linear algebra for distributed memory concurrent computers. In Proceedings of the 4th Symposium on the Frontiers of Massively Parallel Computation, pages 120–127, 1992. Also available as University of Tennessee Technical Report CS-92-181.
Google Scholar
Jack J. Dongarra, Jeremy Du Cruz, Sven Hammarling, and Ian Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
Google Scholar
Anshul Gupta, Fred G. Gustavson, Mahesh Joshi, and Sivan Toledo. The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. Technical Report RC20481, IBM T.J. Watson Research Center, Yorktown Heights, NY, June 1996. Available online from from the IBM Research CyberJournal at http://www. watson.ibm.com:8080.
Google Scholar
IBM Corporation. Engineering and Scientific Subroutine Library, Version 2 Release 2: Guide and Reference, 2nd edition, 1994. Publication number SC23-0526-01.
Google Scholar
Sivan Toledo and Fred G. Gustavson. The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In Proceedings of the 4th Annual Workshop on I/O in Parallel and Distributed Systems, pages 28–40, Philadelphia, May 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, P.O. Box 218, 10598, Yorktown Heights, NY, USA
Anshul Gupta, Fred G. Gustavson & Sivan Toledo
Department of Computer Science, University of Minnesota, 200 Union Street SE, 55455, Minneapolis, MN
Mahesh Joshi

Authors

Anshul Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Sivan Toledo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jerzy Waśniewski Jack Dongarra Kaj Madsen Dorte Olesen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, A., Gustavson, F.G., Joshi, M., Toledo, S. (1996). The design, implementation, and evaluation of a banded linear solver for distributed-memory parallel computers. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_35

Download citation

DOI: https://doi.org/10.1007/3-540-62095-8_35
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62095-2
Online ISBN: 978-3-540-49643-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics