Abstract
This paper shows that QR factorization of large, sparse matrices can be performed efficiently on massively parallel SIMD (single instruction stream/multiple data stream) computers such as the Connection Machine CM-2. The problem is cast as a dataflow graph, whose nodes are mapped to a “virtual dataflow machine” in such a way that only nearest-neighbor communication is required. This virtual machine is implemented by programming the CM-2 processors to support a restricted dataflow protocol. Execution results for several test matrices show that good performance can be obtained without relying on nested dissection techniques.
Similar content being viewed by others
References
Ahmed, H., Delosme, J.-M., and Morf, M. 1982. Highly concurrent computing structures for matrix arithmetic and signal processing. IEEE Comp. (Jan.): 65–81.
Argyris, J., and Bronlund, O. 1975. The natural factor formulation of the stiffness matrix displacement method. Comput. Methods Appl. Mech. Engrg., 5: 97–119.
Betancourt, R. 1986. Efficient parallel processing technique for inverting matrices with random sparsity. IEEE Proc., 133B: 236–240.
Brent, R.P., Luk, F.T., and Van Loan, C. 1985. Computation of the singular value decomposition using mesh connected processors. J. VLSI. and Comp. Systems, 1: 242–270.
Dennis, J.B. 1980. Dataflow supercomputers. IEEE Comp., 13, 11 (Nov.): 48–56.
Dongarra, J., Sameh, A., and Sorensen, D. 1984. Implementation of some concurrent algorithms for matrix factorization. Tech. Rept. MCS/TM-25, Argonne Nat. Laboratory.
Duff, I. 1986. Parallel implementation of multifrontal schemes. Parallel Computing, 3: 193–204.
Gannon, D. 1980. A note on pipelining mesh-connected multiprocessors for finite element problems by nested dissection. In Proc., 1990 Internat. Conf. on Parallel Processing (St. Charles, Ill., Aug. 13–17), Penn. State Univ. Press, pp. 197–204.
George, A., and Heath, M. 1980. Solution of sparse linear least squares problems using Givens rotations. Lin. Alg. Appl., 34: 69–83.
George, A., and Ng, E. 1983. On row and column orderings for sparse least squares problems. SIAM J. Numer. Anal., 20: 326–344.
George, A., and Ng, E. 1984. SPARSPAK: Waterloo Sparse Matrix Package User's Guide for Sparspak-B. Research rept. CS-84–37, Dept. of Comp. Sci., Univ. of Waterloo, Waterloo, Ontario.
George, A., and Ng, E. 1986. Orthogonal reduction of sparse matrices to upper triangular form using Householder transformations. SIAM J. Sci. Stat. Comput., 7: 460–472.
George, A., Heath, M., and Liu, J. 1986. Parallel Cholesky factorization on a shared-memory multiprocessor. Lin. Alg. Appl., 77: 165–187.
Gilbert, J., and Schreiber, R. 1990. Highly parallel sparse Cholesky factorization. Tech. rept. CSL-90-7, Xerox Palo Alto Research Center. (To appear in SIAM J. Sci. Stat. Comp.)
Gill, P., et al. 1986. On projected Newton barrier methods for linear programming and an equivalence to Karmarkar's projective method. Math. Prog., 36: 183–209.
Golub, G., and Van Loan, C. 1989. Matrix Computations, 2nd ed. Johns Hopkins Univ. Press, Baltimore, Md.
Heath, M., and Sorenson, D. 1985. A pipelined Givens method for computing the QR factorization of a sparse matrix. Tech. Memo No. 47, Argonne Nat. Laboratory.
Heath, M., Ng, E., and Peyton, B. 1991. Parallel algorithms for sparse linear systems. SIAM Review, 33, 3 (Sept.): 420–460.
Ida, N., and Udawatta, K. 1986. Solution of large linear systems of equations on the Massively Parallel Processor. In Frontiers of Massively Parallel Scientific Computing (Greenbelt, Md., Sept. 24–25), NASA Conf. Pub. 2478, pp. 257–263.
Karmarkar, N. 1984. A new polynomial time algorithm for linear programming. Combinatorica, 4: 373–395.
Kaufman, L. 1979. Application of dense Householder transformations to a sparse matrix. ACM Trans. Math. Software, 5: 442–450.
Levitt, C. 1989. Grid communications on the Connection Machine: Analysis, performance, and improvements. In Scientific Applications of the Connection Machine (H. Simon, (ed.)), World Scientific, Singapore, pp. 316–322.
Liu, J. 1989. Reordering sparse matrices for parallel elimination. Parallel Computing, 11: 73–91.
Liu, J. 1990. The role of elimination trees in sparse factorization. SIAM J. Matrix Analysis and Applications, 11: 134–172.
Lucas, R. 1987. Solving planar systems of equations on distributed memory multiprocessors. Ph.D. thesis, Stanford Univ., Stanford, Calif.
Middleton, D., and Tamboulian, S. 1989. Evaluating local indirect addressing in SIMD processors. Rept. 89-30, NASA ICASE.
Moore, D. 1987. Dense patch-oriented matrix factorization on a hypercube multiprocessor. TR-87–809, Dept. of Comp. Sci., Cornell Univ., Ithaca, N.Y.
Opsahl, T., and Reif, J. 1986. Solving very large, sparse linear systems on mesh-connected parallel computers. In Frontiers of Massively Parallel Scientific Computing (Greenbelt, Md., Sept. 24–25), NASA Conf. Pub. 2478, pp. 249–256.
Paige, C.C., and Saunders, M.A. 1982. LSQR: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8, 1 (Mar.): 43–71.
Pan, V., and Reif, J. 1985. Efficient parallel solution of linear systems. In Proc., Seventeenth Annual ACM Symp. on the Theory of Computing (Providence, R.I., May 6–8), Academic Press, pp. 143–152.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kratzer, S.G. Sparse QR factorization on a massively parallel computer. J Supercomput 6, 237–255 (1992). https://doi.org/10.1007/BF00155801
Issue Date:
DOI: https://doi.org/10.1007/BF00155801