A support for non-uniform parallel loops and its application to a flame simulation code

Orlando, Salvatore; Perego, Raffaele

doi:10.1007/3-540-63138-0_17

A support for non-uniform parallel loops and its application to a flame simulation code

Salvatore Orlando¹ &
Raffaele Perego²

Systems and Applications
Conference paper
First Online: 01 January 2005

99 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1253))

Abstract

This paper presents SUPPLE (SUPport for Parallel Loop Execution), an innovative run-time support for parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model, and we have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

R. Alverson et al. The Tera computer system. In Proc. of the 1990 ACM Int. Conf. on Supercomputing, pages 1–6, 1990.
Google Scholar
High Performance Fortran Forum. High Performance Fortran Language Specification, May 1993. Version 1.0.
Google Scholar
High Performance Fortran Forum. HPF-2 Scope of Activities and Motivating Applications, Nov. 1994. Version 0.8.
Google Scholar
S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. Comm. of the ACM, 35(8):67–80, Aug. 1992.
Google Scholar
S. Hiranandani, K. Kennedy, and C. Tseng. Evaluating Compiler Optimizations for Fortran D. J. of Parallel and Distr. Comp., 21(1):27–45, April 1994.
Google Scholar
S.F. Hummel, E. Schonberg, and L.E. Flynn. Factoring: A Method for Scheduling Parallel Loops. Comm. of the ACM, 35(8):90–101, Aug. 1992.
Google Scholar
V. Kumar, A.Y. Grama, and N. Rao Vempaty. Scalable Load Balancing Techniques for Parallel Computers. J. of Parallel and Distr. Comp., 22:60–79, 1994.
Google Scholar
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of block algorithms. In 4th Int. Conf. on Architectural Support for Progr. Lang. and Operating Systems, pages 63–74, Santa Clara, CA, April 1991.
Google Scholar
S. Orlando and R. Perego. A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques. In Proc. of the 1996 ACM Int. Conf. on Supercomputing, pages 117–124, 1996.
Google Scholar
O. Plata and F. F. Rivera. Combining static and dynamic scheduling on distributed-memory multiprocessors. In Proceedings of the 1994 ACM Int. Conf. on Supercomputing, pages 186–195, 1994.
Google Scholar
C. Polychronopoulos and D.J. Kuck. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Computers, 36(12), Dec. 1987.
Google Scholar
R. Ponnusamy, J. Saltz, A. Choudary, Y-S Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions. IEEE Trans. on Parallel and Distr. Systems, 6(8):815–831, Aug. 1995.
Google Scholar
J. Saltz et al. Runtime Support and Dynamic Load Balancing Strategies for Structured Adaptive Applications. In Proc. of the 1995 SIAM Conf on Par. Proc. for Scientific Computing, Feb. 1995.
Google Scholar
T.H. Tzen and L.M. Ni. Dynamic Loop Scheduling on Shared-Memory Multiprocessors. In Proc. of Int. Conf. on Parallel Processing — Vol II, pages 247–250, 1991.
Google Scholar
M.H. Willebeek-LeMair and A.P. Reeves. Strategies for Dynamic Load Balancing on Highly Parallel Computers. IEEE Trans. on Parallel and Distr. Systems, 4(9):979–993, Sept. 1993.
Google Scholar
H.S. Zima and B.M. Chapman. Compiling for Distributed-Memory Systems. Proc. of the IEEE, pages 264–287, Feb. 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Mat. Appl. ed Informatica, Università Ca' Foscari di Venezia, 30173, Venezia Mestre, Italy
Salvatore Orlando
CNUCE, Consiglio Nazionale delle Ricerche (CNR), 56126, Pisa, Italy
Raffaele Perego

Authors

Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Gianfranco Bilardi Afonso Ferreira Reinhard Lüling José Rolim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orlando, S., Perego, R. (1997). A support for non-uniform parallel loops and its application to a flame simulation code. In: Bilardi, G., Ferreira, A., Lüling, R., Rolim, J. (eds) Solving Irregularly Structured Problems in Parallel. IRREGULAR 1997. Lecture Notes in Computer Science, vol 1253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63138-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-63138-0_17
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63138-5
Online ISBN: 978-3-540-69157-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics