Abstract
This paper presents SUPPLE (SUPport for Parallel Loop Execution), an innovative run-time support for parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model, and we have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
R. Alverson et al. The Tera computer system. In Proc. of the 1990 ACM Int. Conf. on Supercomputing, pages 1–6, 1990.
High Performance Fortran Forum. High Performance Fortran Language Specification, May 1993. Version 1.0.
High Performance Fortran Forum. HPF-2 Scope of Activities and Motivating Applications, Nov. 1994. Version 0.8.
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. Comm. of the ACM, 35(8):67–80, Aug. 1992.
S. Hiranandani, K. Kennedy, and C. Tseng. Evaluating Compiler Optimizations for Fortran D. J. of Parallel and Distr. Comp., 21(1):27–45, April 1994.
S.F. Hummel, E. Schonberg, and L.E. Flynn. Factoring: A Method for Scheduling Parallel Loops. Comm. of the ACM, 35(8):90–101, Aug. 1992.
V. Kumar, A.Y. Grama, and N. Rao Vempaty. Scalable Load Balancing Techniques for Parallel Computers. J. of Parallel and Distr. Comp., 22:60–79, 1994.
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of block algorithms. In 4th Int. Conf. on Architectural Support for Progr. Lang. and Operating Systems, pages 63–74, Santa Clara, CA, April 1991.
S. Orlando and R. Perego. A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques. In Proc. of the 1996 ACM Int. Conf. on Supercomputing, pages 117–124, 1996.
O. Plata and F. F. Rivera. Combining static and dynamic scheduling on distributed-memory multiprocessors. In Proceedings of the 1994 ACM Int. Conf. on Supercomputing, pages 186–195, 1994.
C. Polychronopoulos and D.J. Kuck. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Computers, 36(12), Dec. 1987.
R. Ponnusamy, J. Saltz, A. Choudary, Y-S Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions. IEEE Trans. on Parallel and Distr. Systems, 6(8):815–831, Aug. 1995.
J. Saltz et al. Runtime Support and Dynamic Load Balancing Strategies for Structured Adaptive Applications. In Proc. of the 1995 SIAM Conf on Par. Proc. for Scientific Computing, Feb. 1995.
T.H. Tzen and L.M. Ni. Dynamic Loop Scheduling on Shared-Memory Multiprocessors. In Proc. of Int. Conf. on Parallel Processing — Vol II, pages 247–250, 1991.
M.H. Willebeek-LeMair and A.P. Reeves. Strategies for Dynamic Load Balancing on Highly Parallel Computers. IEEE Trans. on Parallel and Distr. Systems, 4(9):979–993, Sept. 1993.
H.S. Zima and B.M. Chapman. Compiling for Distributed-Memory Systems. Proc. of the IEEE, pages 264–287, Feb. 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Orlando, S., Perego, R. (1997). A support for non-uniform parallel loops and its application to a flame simulation code. In: Bilardi, G., Ferreira, A., Lüling, R., Rolim, J. (eds) Solving Irregularly Structured Problems in Parallel. IRREGULAR 1997. Lecture Notes in Computer Science, vol 1253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63138-0_17
Download citation
DOI: https://doi.org/10.1007/3-540-63138-0_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63138-5
Online ISBN: 978-3-540-69157-0
eBook Packages: Springer Book Archive