Skip to main content

Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces

  • Conference paper
Book cover High-Performance Computing (ISHPC 2005, ALPS 2006)

Abstract

Parallel nested loops are the largest potential source of parallelism in numerical and scientific applications. Therefore, executing parallel loops with low run-time overhead is very important for achieving high performance on parallel computers. Guided self-scheduling (GSS) has long been used for dynamic scheduling of parallel loops on shared memory parallel machines and for efficient utilization of dynamically allocated processors. In order to minimize the synchronization (or scheduling) overhead in GSS, loop coalescing has been proposed as a restructuring technique to transform nested loops into a single loop. In other words, coalescing “flattens” the iteration space in lexicographic order of the indices of the original loop. Although coalescing helps reduce the run-time scheduling overhead, it does not necessarily minimize the makespan, i.e., the maximum finishing time, especially in situations where the execution time (workload) of iterations is not uniform as is often the case in practice, e.g., in control intensive applications. This can be attributed to the fact that the makespan is directly dependent on the workload distribution across the flattened iteration space. The latter in itself depends on the order of coalescing of the loop indices. We show that coalescing (as proposed) can potentially result in large makespans. In this paper, we present a loop permutation-based approach to loop coalescing, referred to as enhanced loop coalescing, to achieve near-optimal schedules. Several examples are presented and the general technique is discussed in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Petaflops Computing. http://www.aeiveos.com/~bradbury/petaflops

  2. IBM Blue Gene. http://www.research.ibm.com/bluegene/

  3. Brockman, J., Kogge, P., Thoziyoor, S., Kang, E.: PIM Lite: On the road towards relentless multithreading in massively parallel systems. Technical Report 03-01, Department of Computer Science, University of Notre Dame (2003)

    Google Scholar 

  4. Dongarra, J.J., Walker, D.W.: The quest for petascale computing. Computing in Science and Engineering 3(3), 32–39 (2001)

    Article  Google Scholar 

  5. Bailey, D.H.: Onward to petaflops computing. Communications of the ACM 40(6), 90–92 (1997)

    Article  Google Scholar 

  6. Kogge, P.M., Bass, S.C., Brockman, J.B., Chen, D.Z., Sha, E.: Pursuing a Petaflop: Point designs for 100 TF computers using PIM technologies. In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, pp. 88–97 (October 1996)

    Google Scholar 

  7. Sterling, T., Messina, P., Smith, P.H.: Enabling Technologies for Petaflops Computing. MIT Press, Cambridge (1995)

    Google Scholar 

  8. Lou, J., Farrara, J.: Performance analysis and optimization on the UCLA parallel atmospheric general circulation model code. In: Proceedings of the 1996 ACM/IEEE conference on Supercomputing, Pittsburgh, PA, p. 14 (1996)

    Google Scholar 

  9. Plimpton, S., Hendrickson, B., Attaway, S., Swegle, J., Vaughan, C., Gardner, D.: Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics. In: Proceedings of the 1996 ACM/IEEE conference on Supercomputing, Pittsburgh, PA (1996)

    Google Scholar 

  10. Plimpton, S., Attaway, S., Hendrickson, B., Swegle, J., Vanghan, C.: Parallel transient dynamics simulations. J. Parallel Distrib. Comput. 50(1-2), 104–122 (1998)

    Article  MATH  Google Scholar 

  11. Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., Konagaya, A.: Protein Explorer: A petaflops special-purpose computer system for molecular dynamics simulations. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing (2003)

    Google Scholar 

  12. Almasi, G.S., Caşcaval, C., Casta nos, J.G., Denneau, M., Donath, W., Eleftheriou, M., Giampapa, M., Ho, H., Lieber, D., Moreira, J.E., Newns, D., Snir, M., Warren Jr., H.S.: Demonstrating the scalability of a molecular dynamics application on a petaflop computer. In: Proceedings of the 15th International conference on Supercomputing, Sorrento, Italy, pp. 393–406 (2001)

    Google Scholar 

  13. Wallach, S.: Petaflop architectures. In: Proceedings of the Second Conference on Enabling Technologies for Petaflops Computing (February 1999)

    Google Scholar 

  14. Gao, G.R., Likharev, K.K., Messina, P.C., Sterling, T.L.: Hybrid technology multithreaded architecture. In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, Annapolis, MD (1996)

    Google Scholar 

  15. Sterling, T.L., Zima, H.P.: Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Baltimore, MD, pp. 1–23 (2002)

    Google Scholar 

  16. Kuck, D., Sameh, A.H., Cytron, R., Veidenbaum, A., Polychronopoulos, C.D., Lee, G., McDaniel, T., Leasure, B.R., Beckman, C., Davies, J.R.B, Kruskal, C.P.: The effects of program restructuring, algorithm change and architecture choice on program performance. In: Proceedings of the 1984 International Conference on Parallel Processing, pp. 129–138 (August 1984)

    Google Scholar 

  17. Polychronopoulos, C.D., Kuck, D.J., Padua, D.A.: Utilizing multidimensional loop parallelism on large scale parallel processor systems. IEEE Transactions on Computers 38(9), 1285–1296 (1989)

    Article  Google Scholar 

  18. Petersen, P., Padua, D.: Machine-independent evaluation of parallelizing compilers. Technical Report 1173, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign (1992)

    Google Scholar 

  19. Rudolph, D.C., Polychronopoulos, C.D.: An efficient message-passing scheduler based on guided self scheduling. In: Proceedings of the 3rd international conference on Supercomputing, Crete, Greece, pp. 50–61 (1989)

    Google Scholar 

  20. Polychronopoulos, C.: Loop coalescing: A compiler transformation for parallel machines. In: Proceedings of the 1987 International Conference on Parallel Processing, pp. 235–242 (August 1987)

    Google Scholar 

  21. Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers 36(12), 1425–1439 (1987)

    Article  Google Scholar 

  22. Coffman Jr., E.G., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM Journal of Computing 7(1), 1–17 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  23. Garey, M., Johnson, D.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York (1979)

    MATH  Google Scholar 

  24. Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics 17(2), 416–428 (1969)

    Article  MATH  Google Scholar 

  25. Banerjee, U.: A theory of loop permutations. In: Gelernter, D., Nicolau, A., Padua, D. (eds.) Languages and Compilers for Parallel Computing, MIT Press, Cambridge (1990)

    Google Scholar 

  26. Kelley, J.L.: General Topology. D. van Nostrand Company Inc., Princeton (1955)

    MATH  Google Scholar 

  27. Gonzalez, T.F., Ibarra, O.H., Sahni, S.: Bounds for LPT schedules on uniform processors. SIAM Journal of Computing 6(1), 155–166 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  28. Lucco, S.: A dynamic scheduling method for irregular parallel programs. In: Proceedings of the SIGPLAN 1992 Conference on Programming Language Design and Implementation, San Francisco, CA, pp. 200–211 (1992)

    Google Scholar 

  29. Polychronopoulos, C., Kuck, D.J., Padua, D.A.: Execution of parallel loops on parallel processor systems. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 519–527 (August 1986)

    Google Scholar 

  30. Padua, D.A., Wolfe, M.J.: Advanced compiler optimizations for supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)

    Article  Google Scholar 

  31. O’Keefe, M.T., Dietz, H.G.: Loop coalescing and scheduling for barrier mimd architectures. IEEE Transactions on Parallel and Distributed Systems 4(9), 1060–1064 (1993)

    Article  Google Scholar 

  32. Tabirca, T., Freeman, L., Tabirca, S., Yang, L.T.: Feedback guided dynamic loop scheduling; a theoretical approach. In: International Conference on Parallel Processing Workshops, Valencia, Spain, pp. 115–121 (2001)

    Google Scholar 

  33. Lusk, E.L., Overbeek, R.A.: Implementation of monitors with macros: A programming aid for the HEP and other parallel processors. TR ANL-83-97, Argonne National Laboratory (December 1983)

    Google Scholar 

  34. Kruskal, C.P., Weiss, A.: Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering 11(10), 1001–1016 (1985)

    Article  Google Scholar 

  35. Barlow, R.E., Proschan, F.: Statistical Theory of Reliability and Life Testing. Holt Rinehart & Winston Inc. (1975)

    Google Scholar 

  36. Tang, P., Yew, P.C.: Processor self-scheduling for multiple nested parallel loops. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 528–535 (August 1986)

    Google Scholar 

  37. Fang, Z., Tang, P., Yew, P.-C., Zhu, C.-Q.: Dynamic processor self-scheduling for general parallel nested loops. IEEE Transactions on Computers 39(7), 919–929 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jesús Labarta Kazuki Joe Toshinori Sato

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kejariwal, A., Nicolau, A., Polychronopoulos, C.D. (2008). Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces . In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77704-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77703-8

  • Online ISBN: 978-3-540-77704-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics