Fast solution of large N×N matrix equations in an MIMD–SIMD Hybrid System

doi:10.1016/j.parco.2003.05.011

Parallel Computing

Volume 29, Issues 11–12, November–December 2003, Pages 1669-1684

https://doi.org/10.1016/j.parco.2003.05.011 Get rights and content

Abstract

In this paper, we propose a new high-speed computation algorithm for solving a large N×N matrix system using the MIMD–SIMD Hybrid System. The MIMD–SIMD Hybrid System (also denoted as Hybrid System in this paper) is a new parallel architecture consisting of a combination of Cluster of Workstations (COWs) and SIMD systems working concurrently to produce an optimal parallel computation. We first introduce our prototype SIMD system and our Hybrid System setup before presenting how it can be implemented to find the unknowns in a large N×N linear matrix equation system using the Gauss–LU algorithm. This algorithm basically performs the ‘Divide and Conquer’ approach by breaking down the large N×N matrix system into a manageable 32 × 32 matrix for fast computation.

Introduction

Matrix computation has widely been used as a mathematical tool for solving large linear equations in image processing and visualization as well as in other scientific applications. In recent years, there has been an increased interest in parallel solutions of large-scale applications [4], [12] and it has become clear that software packages and library routines need to be developed to maintain pace with the demands. Consequently, new individual algorithms that allow for fast processing of linear solutions began to emerge. Library routines such as LINPACK [9], [10], which was later replaced by LAPACK [2], [3], were introduced. A scalable parallel version, SCALAPACK [7], was also developed and added to the list. Industry also uses these software library routines as benchmarking software to effectively measure the system performance of their individual computers or a parallel cluster of computers. However, parallel software library routines used for finding the unknowns in a large N×N matrix calculations were mainly written for the distributed-memory Multiple Instruction Multiple Data (MIMD) programming model, with so-called scalable distributed-memory hardware in mind. They are not suitable for the Hybrid System configuration, a combination of MIMD and Single Instruction Multiple Data (SIMD). As such, there is a need to develop or modify algorithms to make them suitable for solving large N×N matrices on the Hybrid System. This paper presents and evaluates suitably modified algorithms.

Section snippets

Related work

In the past, many researchers have developed matrices computation algorithms using either fine-grain hypercube parallel architectures or distributed memory architectures. Johnson [14] proposed using the Gauss–Jordan algorithm for solving the linear matrix equation, AX=B, where the A, B and X are square matrices. The author’s algorithm assumed that the number of processors in the hypercube matches the number of elements in the matrix. Chu et al. [8] proposed a parallel algorithm for computing

MIMD–SIMD Hybrid System

Essentially, our Hybrid System is an extended form of parallel architecture in that it is a combination of both SIMD and MIMD systems.

The concept of the Hybrid System architecture is to make use of standard ‘Off-The-Shelf’ components and thereby making the hardware easily maintainable and also of low cost. We have implemented such a prototype Hybrid System and have named this new cluster the Hybrid 4PE Cluster. As the name implies, the Hybrid 4PE Cluster, involves using 4 PCs networked

Gauss–LU algorithm on the Hybrid System

The challenge of implementing an algorithm that can be effective in the Hybrid System is to allow for parallelism of both MIMD and SIMD at the same time. In so doing, there is Double Parallelization i.e. a feature which results when both the MIMD and the SIMD systems compute at the same time. Also, the problem size of the algorithm has to be relatively large, otherwise there is not enough work required to be processed by the SIMD and therefore not performance-effective. The fact that there

Performance test results on the hybrid system

In the solving of the N×N linear matrix equations using the Gauss–LU algorithm (an integration of both the modified GJE and the LU factorization algorithms), we have assumed the possibility of computing the inverse of the sub-matrix along the diagonal. A salient point of the modified GJE algorithm is it allows for both Data-parallelism as well as Function-parallelism existing at the same time. Effectively, while the MIMD prepare matrices and processes the inverse of a sub-matrix for next pass,

Conclusion

In this paper, we have introduced our new low-cost parallel architecture, the Hybrid System, which integrates the SIMD parallel architecture with Cluster of Workstations (an MIMD configuration). We have also introduced our prototype Systola 1024 and have used it to set up the Hybrid System, Hybrid 4PE. In this configuration, we have made use of the Systola 1024 as the SIMD to be incorporated in each of workstations in the Cluster nodes. We have also successfully implemented a parallel algorithm

References (21)

O. Beaumont et al.
Dense linear algebra kernels on heterogeneous platforms: redistribution issue
Journal of Parallel Computing
(2002)
E. Chu et al.
Parallel matrix inversion on a subcube-grid
Journal of Parallel Computing
(1993)
K. Li
Scalable parallel matrix multiplication on distributed memory parallel computers
Journal of Parallel and Distributed Computing
(2001)
A.K. Amoura, E. Bampis, J.-C. Konig, Efficient algorithms for parallel Gaussian Elimination on distributed memory...
E. Anderson et al.
LAPACK User’s Guide
(1999)
E. Anderson, J.J. Dongarra, S. Ostrouchov, Installation guide for LAPACK, Computer Science Department, Technical Report...
Z. Bai et al.
The spectral decomposition of non-symmetric matrices on distributed memory parallel computers
SIAM Journal on Scientific Computing
(1997)
O. Beaumont et al.
Matrix multiplication on heterogenous platforms
IEEE Transactions on Parallel and Distributed Systems
(2001)
J. Choi, J. Dongarra, R. Pozo, D.W. Walker, SCALAPACK: a scalable linear algebra for distributed memory concurrent...
J.J. Dongarra, Performance of various computers using standard linear equations software in a Fortran environment,...

There are more references available in the full text version of this article.

Cited by (2)

Efficient biorthogonal Lanczos algorithm on message passing parallel computer
2010, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A bit-serial floating-point unit for a massively parallel system on a chip
2004, Parallel Algorithms and Applications

View full text

Fast solution of large N×N matrix equations in an MIMD–SIMD Hybrid System

Abstract

Introduction

Section snippets

Related work

MIMD–SIMD Hybrid System

Gauss–LU algorithm on the Hybrid System

Performance test results on the hybrid system

Conclusion

Journal of Parallel Computing

Journal of Parallel Computing

Journal of Parallel and Distributed Computing

LAPACK User’s Guide

The spectral decomposition of non-symmetric matrices on distributed memory parallel computers

SIAM Journal on Scientific Computing

Matrix multiplication on heterogenous platforms

IEEE Transactions on Parallel and Distributed Systems