Abstract
The conjugate gradient squared (CGS) algorithm is a Krylov subspace algorithm that can be used to obtain fast solutions for linear systems (Ax=b) with complex nonsymmetric, very large, and very sparse coefficient matrices (A). By considering electromagnetic scattering problems as examples, a study of the performance and scalability of this algorithm on two MIMD machines is presented. A modified CGS (MCGS) algorithm, where the synchronization overhead is effectively reduced by a factor of two, is proposed in this paper. This is achieved by changing the computation sequence in the CGS algorithm. Both experimental and theoretical analyses are performed to investigate the impact of this modification on the overall execution time. From the theoretical and experimental analysis it is found that CGS is faster than MCGS for smaller number of processors and MCGS outperforms CGS as the number of processors increases. Based on this observation, a set of algorithms approach is proposed, where either CGS or MGS is selected depending on the values of the dimension of the A matrix (N) and number of processors (P). The set approach provides an algorithm that is more scalable than either the CGS or MCGS algorithms. The experiments performed on a 128-processor mesh Intel Paragon and on a 16-processor IBM SP2 with multistage network indicate that MCGS is approximately 20% faster than CGS.
Similar content being viewed by others
References
T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir. SP2 system architecture. IBM Systems Journal, 34:152–184, 1995.
G. S. Almasi and A. Gotlieb. Highly Parallel Computing, 2nd ed. Benjamin Cummings, Redwood City, CA, 1994.
E. Dazevedo, V. Eijkhout, and C. Romaine. Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors. LaPack Working Note 56, 1992.
R. W. Freund. Conjugate gradient]type methods for linear systems with complex symmetric coefficient matrices. SIAM Journal on Scientific and Statistical Computing, 13:425–448, 1992.
R. W. Freund, G. H. Golub, and N. H. Nachtigal. Iterative solution of linear systems. Acta Numerica, pp. 57–100, 1992.
A. George and J. W. Lu. Computer Solution of Large Sparse Positi¨e Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.
B. Lichtenberg. Finite element modeling of wavelength]scale diffractive element. PhD thesis, Purdue University, West Lafayette, IN, 1994.
B. Lichtenberg, K. J. Webb, D. B. Meade, and A. F. Peterson. Comparison of two]dimensional conformal local radiation boundary conditions. Electromagnetics, 16:359–384, 1996.
M. Maheswaran, T. D. Braun, and H. J. Siegel, High performance mixed machine heterogeneous computing. In 6th Euromicro Workshop on Parallel and Distributed Processing, pp. 3–6, 1998.
G. Meurant. Multitasking the conjugate gradient on the Cray X]MPr48. Parallel Computing, 5:267–280, 1987.
Y. Saad, Krylov subspace methods on supercomputers. SIAM Journal on Scientific and Statistical Computing, 10:1200–1232, 1989.
Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. LaPack Working Note 50, 1994.
H. J. Siegel, L. Wang, J. E. So, and M. Maheswaran. Data Parallel Algorithms. In A. Y. Zomaya, ed., Parallel and Distributed Computing Handbook, pp. 466–499. McGraw Hill, New York, NY, 1996.
P. Sonnevald. CGS: A fast Lanczos]type solver for nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 10:36–52 1989.
G. Strang. Linear Algebra and Its Applications, 3rd ed. Harcourt Brace Jovanovich, San Diego, CA, 1988.
C. B. Stunkel, D. G. Shea, B. Abali, M. G. Atkins, C. A. Bender, D. G. Grice, P. H. Hochschild, D. J. Joseph, B. J. Nathanson, R. A. Swetz, R. F. Stucke, M. Tsao, and P. R. Varker. The SP2 high]performance switch. IBM Systems Journal, 34:185–204, 1995.
H. A. Van Der Vorst. Bi]CGSTAB: A fast and smooth converging variant of Bi]CG for the solution of nonsymmetric linear systems. SIAM Journal of Scientific and Statistical Computing, 12:631–644, 1992.
M]C. Wang, W. G. Nation, J. B. Armstrong, H. J. Siegel, S. D. Kim, M. A. Nichols, and M. Gherrity. Multiple quadratic forms: A case study in the design of data]parallel algorithms. Journal of Parallel and Distributed Computing, 21:124–139, 1994.
Z. Xu and K. Hwang. Modeling communication overhead: MPI and MPL performance on the IBM SP2. IEEE Parallel and Distributed Technology, 4:9–23, 1996.
Z. Xu and K. Hwang. Early prediction of MPP performance: The SP2, T3D, and Paragon experiences. Parallel Computing, 22:917–924, 1996.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Maheswaran, M., Webb, K.J. & Siegel, H.J. MCGS: A Modified Conjugate Gradient Squared Algorithm for Nonsymmetric Linear Systems. The Journal of Supercomputing 14, 257–280 (1999). https://doi.org/10.1023/A:1008141600003
Issue Date:
DOI: https://doi.org/10.1023/A:1008141600003