A Compilation Method for Communication-Efficient Partitioning of DOALL Loops

Pande, Santosh; Bali, Tareq

doi:10.1007/3-540-45403-9_12

A Compilation Method for Communication-Efficient Partitioning of DOALL Loops

Santosh Pande⁶ &
Tareq Bali⁶

Chapter
First Online: 01 January 2001

472 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1808))

Summary

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops [6]. But a large category of DOALL loops inevitably result in communication and the tradeoffs between computation and communication must be carefully analyzed for those loops in order to balance out the combined computation time and communication overheads.

In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. First, code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and re-used eliminating a large communication volume. A new larger partition owns rule is formulated to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller non-compute intensive partition. A Partition Interaction Graph is then constructed that is used to merge the partitions to achieve granularity adjustment, computation+communication load balance and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on Cray T3D.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Bau, I. Kodukula, V. Kotlyar, K. Pingali and P. Stodghill, “Solving Alignment Using Elementary Linear Algebra”, Proceedings of 7th International Workshop on Languages and Compilers for Parallel Computing, LNCS 892, 1994, pp. 46–60.
Chapter Google Scholar
J. Anderson and M. Lam, “Global Optimizations for Parallelism and Locality on Scalable Parallel Machines”, Proceedings of SIGPLAN’ 93 conference on Programming Language Design and Implementation, June 1993, pp. 112–125.
Google Scholar
R. Bixby, K. Kennedy and U. Kremer, “Automatic Data Layout Using 0-1 Integer Programming”, Proc. Int’l Conf. on Parallel Architectures and Compilation Techniques, North-Holland, Amsterdam, 1994.
Google Scholar
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt and S. Ranka, “Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers”, Journal of Parallel and Distributed Computing, Special Issue on Data Parallel Algorithms and Programming, Vol. 21, No. 1, April 1994, pp. 15–26.
Article Google Scholar
S. Chatterjee, J. Gilbert, R. Schreiber and S.-H. Teng, “Automatic Array Alignment in Data Parallel Programs”, 20th ACM Symposium on Principles of Programming Languages, pp. 16–28, 1993.
Google Scholar
T. Chen and J. Sheu, “Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers”, IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No.9, September 1994, pp. 924–938.
Article Google Scholar
J. T. Feo, D. C. Cann and R. R. Oldehoeft, “A Report on Sisal Language Project”, Journal of Parallel and Distributed Computing, Vol. 10, No. 4, October 1990, pp. 349–366.
Article Google Scholar
A. Gerasoulis and T. Yang, “On Granularity and Clustering of Directed Acyclic Task Graphs”, IEEE Transactions on Parallel and Distributed Systems, Vol. 4, Number 6, June 1993, pp. 686–701.
Article Google Scholar
M. Girkar and C. Polychronopoulos, “Automatic Extraction of Functional Parallelism from Ordinary Programs”, IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 2, March 1992, pp. 166–178.
Article Google Scholar
G. Gong, R. Gupta and R. Melhem, “Compilation Techniques for Optimizing Communication on Distributed-Memory Systems”, Proceedings of 1993 International Conference on Parallel Processing, Vol. II, pp. 39–46.
Google Scholar
M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers”, IEEE Transactions on Parallel and Distributed Systems, Vol. 3, March 1992, pp. 179–193.
Article Google Scholar
High Performance Fortran Forum. High Performance Fortran Language Specification, Version 1.0, Technical Report, CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, 1992 (revised January 1993).
Google Scholar
S. Hiranandani, K. Kennedy and C.-W. Tseng, “Compiling Fortran for MIMD Distributed-Memory Machines”, Communications of ACM, August 1992, Vol. 35, No. 8, pp. 66–80.
Article Google Scholar
C.-H. Huang and P. Sadayappan, “Communication free Hyperplane Partitioning of Nested Loops”, Journal of Parallel and Distributed Computing, Vol. 19, No. 2, October’ 93, pp. 90–102.
Google Scholar
S. D. Kaushik, C.-H. Huang, R.W. Johnson and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution”, Proceedings of 1994 ACM International Conference on Supercomputing, pp. 364–373, June 1994.
Google Scholar
C. Koelbel and P. Mehrotra, “Compiling Global Name-Space Parallel Loops for Distributed Execution”, IEEE Transactions on Parallel and Distributed Systems, October 1991, Vol. 2, No. 4, pp. 440–451.
Article Google Scholar
J. Li and M. Chen, “Compiling Communication-efficient Programs for Massively Parallel Machines”, IEEE Transactions on Parallel and Distributed Systems, July 1991, pp. 361–376
Google Scholar
A. Lim and M. Lam, “Communication-free Parallelization via Affine Transformations”, Proceedings of 7th International Workshop on Languages and Compilers for Parallel Computing, LNCS 892, 1994, pp. 92–106.
Chapter Google Scholar
D. J. Palermo, E. Su, J. Chandy and P. Banerjee, “Communication Optimizations Used in the PARADIGM Compiler”, Proceedings of the 1994 International Conference on Parallel Processing, Vol. II (Software), pp. II-1–II-10.
Google Scholar
S. S. Pande, D. P. Agrawal, and J. Mauney, “A Scalable Scheduling Method for Functional Parallelism on Distributed Memory Multiprocessors”, IEEE Transactions on Parallel and Distributed Systems Vol. 6, No. 4, April 1995, pp. 388–399
Article Google Scholar
S. S. Pande, D. P. Agrawal and J. Mauney, “Compiling Functional Parallelism on Distributed Memory Systems”, IEEE Parallel and Distributed Technology, Spring 1994, pp. 64–75.
Google Scholar
J. Ramanujam and P. Sadayappan, “Compile-Time Techniques for Data Distribution in Distributed Memory Machines”, IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 4, October 1991, pp. 472–482.
Article Google Scholar
S. Ramaswamy, S. Sapatnekar and P. Banerjee, “A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers”, Proceedings of 1994 International Conference on Parallel Processing, Vol. II (Software), pp. 116–125.
Google Scholar
A. Rogers and K. Pingali, “Process Decomposition through Locality of Reference”, Proceedings of SIGPLAN’ 89 conference on Programming Language Design and Implementation, pp. 69–80.
Google Scholar
J. Saltz, H. Berryman and J. Wu, “Multiprocessors and Run-time Compilation”, Concurrency: Practice & Experience, Vol. 3, No. 4, December 1991, pp. 573–592.
Article Google Scholar
V. Sarkar and G. R. Gao, “Optimization of Array Accesses by Collective Loop Transformations”, Proceedings of 1991 ACM International Conference on Supercomputing, pp. 194–204, June 1991.
Google Scholar
A. Sohn, M. Sato, N. Yoo and J.-L. Gaudiot, “Data and Workload Distribution in a Multi-threaded Architecture”, Journal of Parallel and Distributed Computing 40, February 1997, pp. 256–264.
Article Google Scholar
A. Sohn, R. Biswas and H. Simon, “Impact of Load Balancing on Unstructured Adaptive Computations for Distributed Memory Multiprocessors”, Proc. of 8th IEEE Symposium on Parallel and Distributed Processing, New Orleans, Louisiana, Oct. 1996, pp. 26–33.
Google Scholar
B. Sinharoy and B. Szymanski, “Data and Task Alignment in Distributed Memory Architectures”, Journal of Parallel and Distributed Computing, 21, 1994, pp. 61–74.
Article Google Scholar
P. Tu and D. Padua, “Automatic Array Privatization”, Proceedings of the Sixth Workshop on Language and Compilers for Parallel Computing, August 1993.
Google Scholar
A. Wakatani and M. Wolfe, “A New Approach to Array Redistribution: Strip Mining Redistribution”, Proceedings of PARLE’ 94, Lecture Notes in Computer Science, 817, pp.323–335.
Google Scholar
R. Wolski and J. Feo, “Program Partitioning for NUMA Multiprocessor Computer Systems”, Journal of Parallel and Distributed Computing (special issue on Performance of Supercomputers), Vol. 19, pp. 203–218, 1993.
Article Google Scholar
H. Xu and L. Ni, “Optimizing Data Decomposition for Data Parallel Programs”, Proceedings of International Conference on Parallel Processing, August 1994, Vol. II, pp. 225–232.
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA, 30332
Santosh Pande & Tareq Bali

Authors

Santosh Pande
View author publications
You can also search for this author in PubMed Google Scholar
Tareq Bali
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA, 30332, USA
Santosh Pande
Department of ECECS, University of Cincinnati, P.O. Box 210030, Cincinnati, OH, 45221-0030, USA
Dharma P. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pande, S., Bali, T. (2001). A Compilation Method for Communication-Efficient Partitioning of DOALL Loops. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_12

Download citation

DOI: https://doi.org/10.1007/3-540-45403-9_12
Published: 18 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41945-7
Online ISBN: 978-3-540-45403-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics