Skip to main content

A support for non-uniform parallel loops and its application to a flame simulation code

  • Systems and Applications
  • Conference paper
  • First Online:
  • 99 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1253))

Abstract

This paper presents SUPPLE (SUPport for Parallel Loop Execution), an innovative run-time support for parallel loops with regular stencil data references and non-uniform iteration costs. SUPPLE relies upon a static block data distribution to exploit locality, and combines static and dynamic policies for scheduling non-uniform iterations. It adopts, as far as possible, a static scheduling policy derived from the owner computes rule, and moves data and iterations among processors only if a load imbalance actually occurs. SUPPLE always tries to overlap communications with useful computations by reordering loop iterations and prefetching remote ones in the case of workload imbalance. The SUPPLE approach has been validated by many experimental results obtained by running a multi-dimensional flame simulation kernel on a 64-node Cray T3D. We have fed the benchmark code with several synthetic input data sets built on the basis of a load imbalance model, and we have compared our results with those obtained with a CRAFT Fortran implementation of the benchmark.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Alverson et al. The Tera computer system. In Proc. of the 1990 ACM Int. Conf. on Supercomputing, pages 1–6, 1990.

    Google Scholar 

  2. High Performance Fortran Forum. High Performance Fortran Language Specification, May 1993. Version 1.0.

    Google Scholar 

  3. High Performance Fortran Forum. HPF-2 Scope of Activities and Motivating Applications, Nov. 1994. Version 0.8.

    Google Scholar 

  4. S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD Distributed-Memory Machines. Comm. of the ACM, 35(8):67–80, Aug. 1992.

    Google Scholar 

  5. S. Hiranandani, K. Kennedy, and C. Tseng. Evaluating Compiler Optimizations for Fortran D. J. of Parallel and Distr. Comp., 21(1):27–45, April 1994.

    Google Scholar 

  6. S.F. Hummel, E. Schonberg, and L.E. Flynn. Factoring: A Method for Scheduling Parallel Loops. Comm. of the ACM, 35(8):90–101, Aug. 1992.

    Google Scholar 

  7. V. Kumar, A.Y. Grama, and N. Rao Vempaty. Scalable Load Balancing Techniques for Parallel Computers. J. of Parallel and Distr. Comp., 22:60–79, 1994.

    Google Scholar 

  8. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of block algorithms. In 4th Int. Conf. on Architectural Support for Progr. Lang. and Operating Systems, pages 63–74, Santa Clara, CA, April 1991.

    Google Scholar 

  9. S. Orlando and R. Perego. A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques. In Proc. of the 1996 ACM Int. Conf. on Supercomputing, pages 117–124, 1996.

    Google Scholar 

  10. O. Plata and F. F. Rivera. Combining static and dynamic scheduling on distributed-memory multiprocessors. In Proceedings of the 1994 ACM Int. Conf. on Supercomputing, pages 186–195, 1994.

    Google Scholar 

  11. C. Polychronopoulos and D.J. Kuck. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Computers, 36(12), Dec. 1987.

    Google Scholar 

  12. R. Ponnusamy, J. Saltz, A. Choudary, Y-S Hwang, and G. Fox. Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions. IEEE Trans. on Parallel and Distr. Systems, 6(8):815–831, Aug. 1995.

    Google Scholar 

  13. J. Saltz et al. Runtime Support and Dynamic Load Balancing Strategies for Structured Adaptive Applications. In Proc. of the 1995 SIAM Conf on Par. Proc. for Scientific Computing, Feb. 1995.

    Google Scholar 

  14. T.H. Tzen and L.M. Ni. Dynamic Loop Scheduling on Shared-Memory Multiprocessors. In Proc. of Int. Conf. on Parallel ProcessingVol II, pages 247–250, 1991.

    Google Scholar 

  15. M.H. Willebeek-LeMair and A.P. Reeves. Strategies for Dynamic Load Balancing on Highly Parallel Computers. IEEE Trans. on Parallel and Distr. Systems, 4(9):979–993, Sept. 1993.

    Google Scholar 

  16. H.S. Zima and B.M. Chapman. Compiling for Distributed-Memory Systems. Proc. of the IEEE, pages 264–287, Feb. 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gianfranco Bilardi Afonso Ferreira Reinhard Lüling José Rolim

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Orlando, S., Perego, R. (1997). A support for non-uniform parallel loops and its application to a flame simulation code. In: Bilardi, G., Ferreira, A., Lüling, R., Rolim, J. (eds) Solving Irregularly Structured Problems in Parallel. IRREGULAR 1997. Lecture Notes in Computer Science, vol 1253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63138-0_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-63138-0_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63138-5

  • Online ISBN: 978-3-540-69157-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics