Abstract
We present a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model combines the separate contributions of computation and communication wavefronts. We validate the model on three important supercomputer systems, on up to 500 processors. We use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. We also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. On such machines our analysis shows that, contrary to conventional wisdom, inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor.
Preview
Unable to display preview. Download preview PDF.
References
G. F. Pfister, In Search of Clusters — The Coming Battle in Lowly Parallel Computing, Prentice Hall PTR, Upper Saddle River, NJ, 1995, pages 219–223.
L. Lamport, The Parallel Execution of DO Loops,” Communications of the ACM, 17(2):83:93, ?., 19?.
K. R. Koch, R. S. Baker and R. E. Alcouffe, "Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor," Trans. of the Amer. Nuc. Soc., 65, 198, 1992.
W. D. Joubert, T. Oppe, R. Janardhan, and W. Dearholt, "Fully Parallel Global M/ILU Preconditioning for 3-D Structured Problems," to be submitted to SIAM J. Sci. Comp.
J. Qin and T. Chan, “Performance Analysis in Parallel Triangular Solve,” in Proc. of the 1996 IEEE Second International Conference on Algorithms & Architectures for Parallel Processing, pages 405–412, June, 1996.
M. T. Heath and C. H. Romine, “Parallel Solution of Triangular Systems on Distributed Memory Multiprocessors,” SIAM J. Sci. Statist. Comput. Vol. 9, No. 3, May 1988
R. F. Van der Wijngaart, S. R. Sarukkai, and P. Mehra, “Analysis and Optimization of Software Pipeline Performance on MIMD Parallel Computers,” Technical Report NAS-97-003, NASA Ames Research Center, Moffett Field, CA, February, 1997.
R. E. Alcouffe, "Diffusion Acceleration Methods for the Diamond-Difference Discrete-Ordinates Equations," Nucl. Sci. Eng. (64), 344 (1977).
R. S. Baker and R. E. Alcouffe, “Parallel 3-D S N Performance for DANTSYS/MPI on the CRAY T3D, Proc. of the Joint Intl'l Conf. On Mathematical Methods and Supercomputing for Nuclear Applications, Vol. 1. page 377, 1997.
M. R. Dorr and E. M. Salo, “Performance of a Neutron Transport Code with Full Phase Space Decomposition and the CRAY Research T3D,” ???
R. S. Baker, C. Asano, and D. N. Shirley, “Implementation of the First-Order Form of the 3-D Discrete Ordinates Equations on a T3D, Technical Report LA-UR-95-1925, Los Alamos National Laboratory, Los Alamos, NM, 1995; 1995 American Nuclear Society Meeting, San Francisco, CA, 10/29-11/2/95.
M. R. Dorr and C. H. Still, “Concurrent Source Iteration in the Solution of Three-Dimensional Multigroup Discrete Ordinates Neutron Transport Equations,” Technical Report UCRL-JC-116694, Rev 1, Lawrence Livermore National Laboratory, Livermore, CA, May, 1995.
E. E. Lewis and W. F. Miller, Computational Methods of Neutron Transport, American Nuclear Society, Inc., LaGrange Park, IL, 1993.
R. E. Alcouffe, R. Baker, F. W. Brinkley, Marr, D., R. D. O'Dell and W. Walters, “DANTSYS: A Diffusion Acclerated Neutral Particle Transport Code,” Technical Report LA-12969-M, Los Alamos National Laboratory, Los Alamos, NM, 1995.
D. Culler, R. Karp, D. Patterson, A. Sahay, E. Santos, K. Schauser, R. Subramonian, and T. von Eiken, “LogP: A Practical Model of Parallel Computation,” Communications of the ACM, 39(11):79:85, Nov., 1996.
H. J. Wasserman, O. M. Lubeck, Y. Luo and F. Bassetti, “Performance Evaluation of the SGI Origin2000: A Memory-Centric Characterization of LANL ASCI Applications,” Proceedings of SC97, IEEE Computer Society, November, 1997.
C. Holt, M. Heinrich, J. P. Singh, E. Rothberg, and J. L. Hennessy, “The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors,” Stanford University Computer Science Report CSL-TR-95-660, January, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Hoisie, A., Lubeck, O., Wasserman, H. (1999). Performance analysis of wavefront algorithms on very-large scale distributed systems. In: Cooperman, G., Jessen, E., Michler, G. (eds) Workshop on wide area networks and high performance computing. Lecture Notes in Control and Information Sciences, vol 249. Springer, London. https://doi.org/10.1007/BFb0110087
Download citation
DOI: https://doi.org/10.1007/BFb0110087
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-85233-642-4
Online ISBN: 978-1-84628-578-3
eBook Packages: Springer Book Archive