Skip to main content

A Compilation Method for Communication-Efficient Partitioning of DOALL Loops

  • Chapter
  • First Online:
  • 472 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1808))

Summary

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops [6]. But a large category of DOALL loops inevitably result in communication and the tradeoffs between computation and communication must be carefully analyzed for those loops in order to balance out the combined computation time and communication overheads.

In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. First, code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and re-used eliminating a large communication volume. A new larger partition owns rule is formulated to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller non-compute intensive partition. A Partition Interaction Graph is then constructed that is used to merge the partitions to achieve granularity adjustment, computation+communication load balance and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on Cray T3D.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Bau, I. Kodukula, V. Kotlyar, K. Pingali and P. Stodghill, “Solving Alignment Using Elementary Linear Algebra”, Proceedings of 7th International Workshop on Languages and Compilers for Parallel Computing, LNCS 892, 1994, pp. 46–60.

    Chapter  Google Scholar 

  2. J. Anderson and M. Lam, “Global Optimizations for Parallelism and Locality on Scalable Parallel Machines”, Proceedings of SIGPLAN’ 93 conference on Programming Language Design and Implementation, June 1993, pp. 112–125.

    Google Scholar 

  3. R. Bixby, K. Kennedy and U. Kremer, “Automatic Data Layout Using 0-1 Integer Programming”, Proc. Int’l Conf. on Parallel Architectures and Compilation Techniques, North-Holland, Amsterdam, 1994.

    Google Scholar 

  4. Z. Bozkus, A. Choudhary, G. Fox, T. Haupt and S. Ranka, “Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers”, Journal of Parallel and Distributed Computing, Special Issue on Data Parallel Algorithms and Programming, Vol. 21, No. 1, April 1994, pp. 15–26.

    Article  Google Scholar 

  5. S. Chatterjee, J. Gilbert, R. Schreiber and S.-H. Teng, “Automatic Array Alignment in Data Parallel Programs”, 20th ACM Symposium on Principles of Programming Languages, pp. 16–28, 1993.

    Google Scholar 

  6. T. Chen and J. Sheu, “Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers”, IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No.9, September 1994, pp. 924–938.

    Article  Google Scholar 

  7. J. T. Feo, D. C. Cann and R. R. Oldehoeft, “A Report on Sisal Language Project”, Journal of Parallel and Distributed Computing, Vol. 10, No. 4, October 1990, pp. 349–366.

    Article  Google Scholar 

  8. A. Gerasoulis and T. Yang, “On Granularity and Clustering of Directed Acyclic Task Graphs”, IEEE Transactions on Parallel and Distributed Systems, Vol. 4, Number 6, June 1993, pp. 686–701.

    Article  Google Scholar 

  9. M. Girkar and C. Polychronopoulos, “Automatic Extraction of Functional Parallelism from Ordinary Programs”, IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 2, March 1992, pp. 166–178.

    Article  Google Scholar 

  10. G. Gong, R. Gupta and R. Melhem, “Compilation Techniques for Optimizing Communication on Distributed-Memory Systems”, Proceedings of 1993 International Conference on Parallel Processing, Vol. II, pp. 39–46.

    Google Scholar 

  11. M. Gupta and P. Banerjee, “Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers”, IEEE Transactions on Parallel and Distributed Systems, Vol. 3, March 1992, pp. 179–193.

    Article  Google Scholar 

  12. High Performance Fortran Forum. High Performance Fortran Language Specification, Version 1.0, Technical Report, CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, TX, 1992 (revised January 1993).

    Google Scholar 

  13. S. Hiranandani, K. Kennedy and C.-W. Tseng, “Compiling Fortran for MIMD Distributed-Memory Machines”, Communications of ACM, August 1992, Vol. 35, No. 8, pp. 66–80.

    Article  Google Scholar 

  14. C.-H. Huang and P. Sadayappan, “Communication free Hyperplane Partitioning of Nested Loops”, Journal of Parallel and Distributed Computing, Vol. 19, No. 2, October’ 93, pp. 90–102.

    Google Scholar 

  15. S. D. Kaushik, C.-H. Huang, R.W. Johnson and P. Sadayappan, “An Approach to Communication-Efficient Data Redistribution”, Proceedings of 1994 ACM International Conference on Supercomputing, pp. 364–373, June 1994.

    Google Scholar 

  16. C. Koelbel and P. Mehrotra, “Compiling Global Name-Space Parallel Loops for Distributed Execution”, IEEE Transactions on Parallel and Distributed Systems, October 1991, Vol. 2, No. 4, pp. 440–451.

    Article  Google Scholar 

  17. J. Li and M. Chen, “Compiling Communication-efficient Programs for Massively Parallel Machines”, IEEE Transactions on Parallel and Distributed Systems, July 1991, pp. 361–376

    Google Scholar 

  18. A. Lim and M. Lam, “Communication-free Parallelization via Affine Transformations”, Proceedings of 7th International Workshop on Languages and Compilers for Parallel Computing, LNCS 892, 1994, pp. 92–106.

    Chapter  Google Scholar 

  19. D. J. Palermo, E. Su, J. Chandy and P. Banerjee, “Communication Optimizations Used in the PARADIGM Compiler”, Proceedings of the 1994 International Conference on Parallel Processing, Vol. II (Software), pp. II-1–II-10.

    Google Scholar 

  20. S. S. Pande, D. P. Agrawal, and J. Mauney, “A Scalable Scheduling Method for Functional Parallelism on Distributed Memory Multiprocessors”, IEEE Transactions on Parallel and Distributed Systems Vol. 6, No. 4, April 1995, pp. 388–399

    Article  Google Scholar 

  21. S. S. Pande, D. P. Agrawal and J. Mauney, “Compiling Functional Parallelism on Distributed Memory Systems”, IEEE Parallel and Distributed Technology, Spring 1994, pp. 64–75.

    Google Scholar 

  22. J. Ramanujam and P. Sadayappan, “Compile-Time Techniques for Data Distribution in Distributed Memory Machines”, IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No. 4, October 1991, pp. 472–482.

    Article  Google Scholar 

  23. S. Ramaswamy, S. Sapatnekar and P. Banerjee, “A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers”, Proceedings of 1994 International Conference on Parallel Processing, Vol. II (Software), pp. 116–125.

    Google Scholar 

  24. A. Rogers and K. Pingali, “Process Decomposition through Locality of Reference”, Proceedings of SIGPLAN’ 89 conference on Programming Language Design and Implementation, pp. 69–80.

    Google Scholar 

  25. J. Saltz, H. Berryman and J. Wu, “Multiprocessors and Run-time Compilation”, Concurrency: Practice & Experience, Vol. 3, No. 4, December 1991, pp. 573–592.

    Article  Google Scholar 

  26. V. Sarkar and G. R. Gao, “Optimization of Array Accesses by Collective Loop Transformations”, Proceedings of 1991 ACM International Conference on Supercomputing, pp. 194–204, June 1991.

    Google Scholar 

  27. A. Sohn, M. Sato, N. Yoo and J.-L. Gaudiot, “Data and Workload Distribution in a Multi-threaded Architecture”, Journal of Parallel and Distributed Computing 40, February 1997, pp. 256–264.

    Article  Google Scholar 

  28. A. Sohn, R. Biswas and H. Simon, “Impact of Load Balancing on Unstructured Adaptive Computations for Distributed Memory Multiprocessors”, Proc. of 8th IEEE Symposium on Parallel and Distributed Processing, New Orleans, Louisiana, Oct. 1996, pp. 26–33.

    Google Scholar 

  29. B. Sinharoy and B. Szymanski, “Data and Task Alignment in Distributed Memory Architectures”, Journal of Parallel and Distributed Computing, 21, 1994, pp. 61–74.

    Article  Google Scholar 

  30. P. Tu and D. Padua, “Automatic Array Privatization”, Proceedings of the Sixth Workshop on Language and Compilers for Parallel Computing, August 1993.

    Google Scholar 

  31. A. Wakatani and M. Wolfe, “A New Approach to Array Redistribution: Strip Mining Redistribution”, Proceedings of PARLE’ 94, Lecture Notes in Computer Science, 817, pp.323–335.

    Google Scholar 

  32. R. Wolski and J. Feo, “Program Partitioning for NUMA Multiprocessor Computer Systems”, Journal of Parallel and Distributed Computing (special issue on Performance of Supercomputers), Vol. 19, pp. 203–218, 1993.

    Article  Google Scholar 

  33. H. Xu and L. Ni, “Optimizing Data Decomposition for Data Parallel Programs”, Proceedings of International Conference on Parallel Processing, August 1994, Vol. II, pp. 225–232.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pande, S., Bali, T. (2001). A Compilation Method for Communication-Efficient Partitioning of DOALL Loops. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-45403-9_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41945-7

  • Online ISBN: 978-3-540-45403-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics