Skip to main content
Log in

A Loop Transformation Algorithm for Communication Overlapping

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Stanford SUIF Compiler Group, SUIF: A Parallelizing and Optimizing Research Compiler, Technical Report, Stanford University, CSL-TR-94-620 (1994).

  2. C. W. Tseng, An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines, Ph.D. thesis, Rice University, CRPC-TR93291 (1993).

  3. H. P. Zima, H. J. Bast, and M. Gerndt, SUPERB: A Tool for Semiautomatic MIMD-SIMD Parallelization, Parallel Computing 6:1–18 (1988).

    Google Scholar 

  4. Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka, Fortran90D-HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation and Performance Results, Proc. Supercomputing, pp. 351–360 (1993).

  5. P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, and S. Ramaswamy, The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers, Proc. First Int'l Workshop on Parallel Processing, pp. 322–330 (1994).

  6. T. Shindo, H. Iwashita, T. Doi, J. Hagiwara, and S. Kaneshiro, HPF Compiler for the AP1000, Proc. Int'l. Conf. Supercomputing, pp. 190–194 (1995).

  7. High Performance Fortran Forum, High Performance Fortran Language Specification, Version 1.0, Technical Report, Rice University, CRPC-TR92225 (1992).

  8. S. Hiranandani, K. Kennedy, and C. W. Tseng, Compiling Fortran D for MIMD Distributed-Memory Machines, Comm. ACM 35:66-80 (1992).

    Google Scholar 

  9. D. E. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. Eicken, and K. Yelick, Parallel Programming in Split-C, Proc. Supercomputing, pp. 262–273 (1993).

  10. A. Lain and P. Banerjee, Techniques to Overlap Computation and Communication in Irregular Iterative Applications, Proc. Int'l Conf. Supercomputing, pp. 236–245 (1994).

  11. S. Hiranandani, K. Kennedy, and C. W. Tseng, Preliminary Experiences with the Fortan D Compiler, Proc. Supercomputing, pp. 338–350 (1993).

  12. T. Horie, K. Hayashi, T. Shimizu, and H. Ishihata, Improving AP1000 Parallel Computer Performance with Message Communication, 20th Ann. Int'l Symp. Computer Architecture, pp. 314–325 (1993).

  13. A. Rogar and K. Pingali, Process Decomposition Through Locality of Reference, Proc. SIGPLAN '89 Conf. Progr. Language Design and Implementation (1989).

  14. D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee, Communication Optimizations Used in the PARADIGM Compiler for Distributed-Memory Multicomputers, Proc. 23rd Int'l Conf. Parallel Processing, pp. II:1–10 (1994).

  15. H. Ohta, Y. Saito, M. Kainaga, and H. Ono, Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests, Proc. Int'l Conf. Supercomputing, pp. 270–279 (1995).

  16. U. Banerjee, Unimodular Transformations of Double Loops, Proc. Workshop on Advances Lang. Compilers for Parallel Processing, pp. 192–219 (1990).

  17. M. Wolfe, High Performance Compiler for Parallel Computing, Addison-Wesley Publishing Company, (1995).

  18. M. Wolfe, More Iteration Space Tiling, Proc. Supercomputing, pp. 655–664 (1989).

  19. M. E. Wolfe and M. S. Lam, A Loop Transformation and Theory and an Algorithm to Maximize Parallelism, IEEE Trans. Parallel Distrib. Syst. 2(4):452–471 (1991).

    Google Scholar 

  20. T. Agewara, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, SP2 System Architecture, IBM Syst. J. 344(2):152–184 (1995).

    Google Scholar 

  21. M. E. Wolfe and M. S. Lam, A Data Locality Optimizing Algorithm, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 30–44 (1991).

  22. C. Koelbel, P. Mehrotra, and J. V. Rosendale, Supporting Shared Data Structures on Distributed Memory Architectures, Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 177–186 (1990).

  23. R. Hanxlenden and K. Kennedy, GIVE-N-TAKE: A Balanced Code Placement Framework, Proc. ACM SIGPLAN '94 Conf. Progr. Lang. Design and Implementation, pp. 107–120 (1994).

  24. K. Fujiwara, K. Shiratori, M. Suzuki and H. Kasahara, Multiprocessor Scheduling Algorithms Considering Data-Preloading and Poststoring, Trans. IEICE, D-1 75(8):495–503 (1992).

    Google Scholar 

  25. A. W. Lim and M. S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, Conf. Record of the 24th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Progr. Lang. (1997).

  26. J. M. Anderson, S. P. Amarasinghe, and M. S. Lam, Data and Computation Transformations for Multiprocessors, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).

  27. Michael Philippsen, Automatic Alignment of Array Data and Processes to Reduce Communication Time on DMPPs, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).

  28. K. Ishizaki and H. Komatsu, A Loop Parallelization Algorithm for HPF Compilers, Eigth Workshop on Language and Compilers for Parallel Computing, pp. 12.1–15 (1995).

  29. D. Calllahan and K. Kennedy, Compiling Programs for Distributed-Memory Multiprocessors, J. Supercomputing 2:151–169 (1988).

    Google Scholar 

  30. T. Suganuma, H. Komatsu, and T. Nakatani, Detection and Global Optimization of Reduction Operations for Distributed Parallel Machines, Proc. Int'l Conf. Supercomputing (1996).

  31. M. Snir, P. Hochschild, D. D. Fryer, and K. J. Gildea, The Communication Software and Parallel Environment of the IBM SP2, IBM Syst. J. 34(2):205–221 (1995).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishizaki, K., Komatsu, H. & Nakatani, T. A Loop Transformation Algorithm for Communication Overlapping. International Journal of Parallel Programming 28, 135–154 (2000). https://doi.org/10.1023/A:1007554715418

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007554715418

Navigation