Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces

Kejariwal, Arun; Nicolau, Alexandru; Polychronopoulos, Constantine D.

doi:10.1007/978-3-540-77704-5_2

Arun Kejariwal¹,
Alexandru Nicolau¹ &
Constantine D. Polychronopoulos²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4759))

Included in the following conference series:

864 Accesses

Abstract

Parallel nested loops are the largest potential source of parallelism in numerical and scientific applications. Therefore, executing parallel loops with low run-time overhead is very important for achieving high performance on parallel computers. Guided self-scheduling (GSS) has long been used for dynamic scheduling of parallel loops on shared memory parallel machines and for efficient utilization of dynamically allocated processors. In order to minimize the synchronization (or scheduling) overhead in GSS, loop coalescing has been proposed as a restructuring technique to transform nested loops into a single loop. In other words, coalescing “flattens” the iteration space in lexicographic order of the indices of the original loop. Although coalescing helps reduce the run-time scheduling overhead, it does not necessarily minimize the makespan, i.e., the maximum finishing time, especially in situations where the execution time (workload) of iterations is not uniform as is often the case in practice, e.g., in control intensive applications. This can be attributed to the fact that the makespan is directly dependent on the workload distribution across the flattened iteration space. The latter in itself depends on the order of coalescing of the loop indices. We show that coalescing (as proposed) can potentially result in large makespans. In this paper, we present a loop permutation-based approach to loop coalescing, referred to as enhanced loop coalescing, to achieve near-optimal schedules. Several examples are presented and the general technique is discussed in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Petaflops Computing. http://www.aeiveos.com/~bradbury/petaflops
IBM Blue Gene. http://www.research.ibm.com/bluegene/
Brockman, J., Kogge, P., Thoziyoor, S., Kang, E.: PIM Lite: On the road towards relentless multithreading in massively parallel systems. Technical Report 03-01, Department of Computer Science, University of Notre Dame (2003)
Google Scholar
Dongarra, J.J., Walker, D.W.: The quest for petascale computing. Computing in Science and Engineering 3(3), 32–39 (2001)
Article Google Scholar
Bailey, D.H.: Onward to petaflops computing. Communications of the ACM 40(6), 90–92 (1997)
Article Google Scholar
Kogge, P.M., Bass, S.C., Brockman, J.B., Chen, D.Z., Sha, E.: Pursuing a Petaflop: Point designs for 100 TF computers using PIM technologies. In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, pp. 88–97 (October 1996)
Google Scholar
Sterling, T., Messina, P., Smith, P.H.: Enabling Technologies for Petaflops Computing. MIT Press, Cambridge (1995)
Google Scholar
Lou, J., Farrara, J.: Performance analysis and optimization on the UCLA parallel atmospheric general circulation model code. In: Proceedings of the 1996 ACM/IEEE conference on Supercomputing, Pittsburgh, PA, p. 14 (1996)
Google Scholar
Plimpton, S., Hendrickson, B., Attaway, S., Swegle, J., Vaughan, C., Gardner, D.: Transient dynamics simulations: parallel algorithms for contact detection and smoothed particle hydrodynamics. In: Proceedings of the 1996 ACM/IEEE conference on Supercomputing, Pittsburgh, PA (1996)
Google Scholar
Plimpton, S., Attaway, S., Hendrickson, B., Swegle, J., Vanghan, C.: Parallel transient dynamics simulations. J. Parallel Distrib. Comput. 50(1-2), 104–122 (1998)
Article MATH Google Scholar
Taiji, M., Narumi, T., Ohno, Y., Futatsugi, N., Suenaga, A., Takada, N., Konagaya, A.: Protein Explorer: A petaflops special-purpose computer system for molecular dynamics simulations. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing (2003)
Google Scholar
Almasi, G.S., Caşcaval, C., Casta nos, J.G., Denneau, M., Donath, W., Eleftheriou, M., Giampapa, M., Ho, H., Lieber, D., Moreira, J.E., Newns, D., Snir, M., Warren Jr., H.S.: Demonstrating the scalability of a molecular dynamics application on a petaflop computer. In: Proceedings of the 15th International conference on Supercomputing, Sorrento, Italy, pp. 393–406 (2001)
Google Scholar
Wallach, S.: Petaflop architectures. In: Proceedings of the Second Conference on Enabling Technologies for Petaflops Computing (February 1999)
Google Scholar
Gao, G.R., Likharev, K.K., Messina, P.C., Sterling, T.L.: Hybrid technology multithreaded architecture. In: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation, Annapolis, MD (1996)
Google Scholar
Sterling, T.L., Zima, H.P.: Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Baltimore, MD, pp. 1–23 (2002)
Google Scholar
Kuck, D., Sameh, A.H., Cytron, R., Veidenbaum, A., Polychronopoulos, C.D., Lee, G., McDaniel, T., Leasure, B.R., Beckman, C., Davies, J.R.B, Kruskal, C.P.: The effects of program restructuring, algorithm change and architecture choice on program performance. In: Proceedings of the 1984 International Conference on Parallel Processing, pp. 129–138 (August 1984)
Google Scholar
Polychronopoulos, C.D., Kuck, D.J., Padua, D.A.: Utilizing multidimensional loop parallelism on large scale parallel processor systems. IEEE Transactions on Computers 38(9), 1285–1296 (1989)
Article Google Scholar
Petersen, P., Padua, D.: Machine-independent evaluation of parallelizing compilers. Technical Report 1173, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign (1992)
Google Scholar
Rudolph, D.C., Polychronopoulos, C.D.: An efficient message-passing scheduler based on guided self scheduling. In: Proceedings of the 3rd international conference on Supercomputing, Crete, Greece, pp. 50–61 (1989)
Google Scholar
Polychronopoulos, C.: Loop coalescing: A compiler transformation for parallel machines. In: Proceedings of the 1987 International Conference on Parallel Processing, pp. 235–242 (August 1987)
Google Scholar
Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Transactions on Computers 36(12), 1425–1439 (1987)
Article Google Scholar
Coffman Jr., E.G., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM Journal of Computing 7(1), 1–17 (1978)
Article MATH MathSciNet Google Scholar
Garey, M., Johnson, D.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York (1979)
MATH Google Scholar
Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics 17(2), 416–428 (1969)
Article MATH Google Scholar
Banerjee, U.: A theory of loop permutations. In: Gelernter, D., Nicolau, A., Padua, D. (eds.) Languages and Compilers for Parallel Computing, MIT Press, Cambridge (1990)
Google Scholar
Kelley, J.L.: General Topology. D. van Nostrand Company Inc., Princeton (1955)
MATH Google Scholar
Gonzalez, T.F., Ibarra, O.H., Sahni, S.: Bounds for LPT schedules on uniform processors. SIAM Journal of Computing 6(1), 155–166 (1977)
Article MATH MathSciNet Google Scholar
Lucco, S.: A dynamic scheduling method for irregular parallel programs. In: Proceedings of the SIGPLAN 1992 Conference on Programming Language Design and Implementation, San Francisco, CA, pp. 200–211 (1992)
Google Scholar
Polychronopoulos, C., Kuck, D.J., Padua, D.A.: Execution of parallel loops on parallel processor systems. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 519–527 (August 1986)
Google Scholar
Padua, D.A., Wolfe, M.J.: Advanced compiler optimizations for supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)
Article Google Scholar
O’Keefe, M.T., Dietz, H.G.: Loop coalescing and scheduling for barrier mimd architectures. IEEE Transactions on Parallel and Distributed Systems 4(9), 1060–1064 (1993)
Article Google Scholar
Tabirca, T., Freeman, L., Tabirca, S., Yang, L.T.: Feedback guided dynamic loop scheduling; a theoretical approach. In: International Conference on Parallel Processing Workshops, Valencia, Spain, pp. 115–121 (2001)
Google Scholar
Lusk, E.L., Overbeek, R.A.: Implementation of monitors with macros: A programming aid for the HEP and other parallel processors. TR ANL-83-97, Argonne National Laboratory (December 1983)
Google Scholar
Kruskal, C.P., Weiss, A.: Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering 11(10), 1001–1016 (1985)
Article Google Scholar
Barlow, R.E., Proschan, F.: Statistical Theory of Reliability and Life Testing. Holt Rinehart & Winston Inc. (1975)
Google Scholar
Tang, P., Yew, P.C.: Processor self-scheduling for multiple nested parallel loops. In: Proceedings of the 1986 International Conference on Parallel Processing, pp. 528–535 (August 1986)
Google Scholar
Fang, Z., Tang, P., Yew, P.-C., Zhu, C.-Q.: Dynamic processor self-scheduling for general parallel nested loops. IEEE Transactions on Computers 39(7), 919–929 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Embedded Computer Systems, University of California at Irvine, Irvine, CA, 92697, USA
Arun Kejariwal & Alexandru Nicolau
Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Constantine D. Polychronopoulos

Authors

Arun Kejariwal
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Nicolau
View author publications
You can also search for this author in PubMed Google Scholar
Constantine D. Polychronopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jesús Labarta Kazuki Joe Toshinori Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kejariwal, A., Nicolau, A., Polychronopoulos, C.D. (2008). Enhanced Loop Coalescing: A Compiler Technique for Transforming Non-uniform Iteration Spaces . In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-77704-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77703-8
Online ISBN: 978-3-540-77704-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics