A Loop Transformation Algorithm for Communication Overlapping

Ishizaki, Kazuaki; Komatsu, Hideaki; Nakatani, Toshio

doi:10.1023/A:1007554715418

A Loop Transformation Algorithm for Communication Overlapping

Published: April 2000

Volume 28, pages 135–154, (2000)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Kazuaki Ishizaki¹,
Hideaki Komatsu¹ &
Toshio Nakatani¹

70 Accesses
Explore all metrics

Abstract

Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Practical and Aggressive Loop Fission Technique

Locality-Based Optimizations in the Chapel Compiler

An Analytical Model for Loop Tiling Transformation

REFERENCES

Stanford SUIF Compiler Group, SUIF: A Parallelizing and Optimizing Research Compiler, Technical Report, Stanford University, CSL-TR-94-620 (1994).
C. W. Tseng, An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines, Ph.D. thesis, Rice University, CRPC-TR93291 (1993).
H. P. Zima, H. J. Bast, and M. Gerndt, SUPERB: A Tool for Semiautomatic MIMD-SIMD Parallelization, Parallel Computing 6:1–18 (1988).
Google Scholar
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka, Fortran90D-HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation and Performance Results, Proc. Supercomputing, pp. 351–360 (1993).
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, and S. Ramaswamy, The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers, Proc. First Int'l Workshop on Parallel Processing, pp. 322–330 (1994).
T. Shindo, H. Iwashita, T. Doi, J. Hagiwara, and S. Kaneshiro, HPF Compiler for the AP1000, Proc. Int'l. Conf. Supercomputing, pp. 190–194 (1995).
High Performance Fortran Forum, High Performance Fortran Language Specification, Version 1.0, Technical Report, Rice University, CRPC-TR92225 (1992).
S. Hiranandani, K. Kennedy, and C. W. Tseng, Compiling Fortran D for MIMD Distributed-Memory Machines, Comm. ACM 35:66-80 (1992).
Google Scholar
D. E. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. Eicken, and K. Yelick, Parallel Programming in Split-C, Proc. Supercomputing, pp. 262–273 (1993).
A. Lain and P. Banerjee, Techniques to Overlap Computation and Communication in Irregular Iterative Applications, Proc. Int'l Conf. Supercomputing, pp. 236–245 (1994).
S. Hiranandani, K. Kennedy, and C. W. Tseng, Preliminary Experiences with the Fortan D Compiler, Proc. Supercomputing, pp. 338–350 (1993).
T. Horie, K. Hayashi, T. Shimizu, and H. Ishihata, Improving AP1000 Parallel Computer Performance with Message Communication, 20th Ann. Int'l Symp. Computer Architecture, pp. 314–325 (1993).
A. Rogar and K. Pingali, Process Decomposition Through Locality of Reference, Proc. SIGPLAN '89 Conf. Progr. Language Design and Implementation (1989).
D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee, Communication Optimizations Used in the PARADIGM Compiler for Distributed-Memory Multicomputers, Proc. 23rd Int'l Conf. Parallel Processing, pp. II:1–10 (1994).
H. Ohta, Y. Saito, M. Kainaga, and H. Ono, Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests, Proc. Int'l Conf. Supercomputing, pp. 270–279 (1995).
U. Banerjee, Unimodular Transformations of Double Loops, Proc. Workshop on Advances Lang. Compilers for Parallel Processing, pp. 192–219 (1990).
M. Wolfe, High Performance Compiler for Parallel Computing, Addison-Wesley Publishing Company, (1995).
M. Wolfe, More Iteration Space Tiling, Proc. Supercomputing, pp. 655–664 (1989).
M. E. Wolfe and M. S. Lam, A Loop Transformation and Theory and an Algorithm to Maximize Parallelism, IEEE Trans. Parallel Distrib. Syst. 2(4):452–471 (1991).
Google Scholar
T. Agewara, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, SP2 System Architecture, IBM Syst. J. 344(2):152–184 (1995).
Google Scholar
M. E. Wolfe and M. S. Lam, A Data Locality Optimizing Algorithm, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 30–44 (1991).
C. Koelbel, P. Mehrotra, and J. V. Rosendale, Supporting Shared Data Structures on Distributed Memory Architectures, Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 177–186 (1990).
R. Hanxlenden and K. Kennedy, GIVE-N-TAKE: A Balanced Code Placement Framework, Proc. ACM SIGPLAN '94 Conf. Progr. Lang. Design and Implementation, pp. 107–120 (1994).
K. Fujiwara, K. Shiratori, M. Suzuki and H. Kasahara, Multiprocessor Scheduling Algorithms Considering Data-Preloading and Poststoring, Trans. IEICE, D-1 75(8):495–503 (1992).
Google Scholar
A. W. Lim and M. S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, Conf. Record of the 24th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Progr. Lang. (1997).
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam, Data and Computation Transformations for Multiprocessors, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).
Michael Philippsen, Automatic Alignment of Array Data and Processes to Reduce Communication Time on DMPPs, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).
K. Ishizaki and H. Komatsu, A Loop Parallelization Algorithm for HPF Compilers, Eigth Workshop on Language and Compilers for Parallel Computing, pp. 12.1–15 (1995).
D. Calllahan and K. Kennedy, Compiling Programs for Distributed-Memory Multiprocessors, J. Supercomputing 2:151–169 (1988).
Google Scholar
T. Suganuma, H. Komatsu, and T. Nakatani, Detection and Global Optimization of Reduction Operations for Distributed Parallel Machines, Proc. Int'l Conf. Supercomputing (1996).
M. Snir, P. Hochschild, D. D. Fryer, and K. J. Gildea, The Communication Software and Parallel Environment of the IBM SP2, IBM Syst. J. 34(2):205–221 (1995).
Google Scholar

Download references

Author information

Authors and Affiliations

1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken, 242-8502, Japan
Kazuaki Ishizaki, Hideaki Komatsu & Toshio Nakatani

Authors

Kazuaki Ishizaki
View author publications
You can also search for this author in PubMed Google Scholar
Hideaki Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Toshio Nakatani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishizaki, K., Komatsu, H. & Nakatani, T. A Loop Transformation Algorithm for Communication Overlapping. International Journal of Parallel Programming 28, 135–154 (2000). https://doi.org/10.1023/A:1007554715418

Download citation

Issue Date: April 2000
DOI: https://doi.org/10.1023/A:1007554715418

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Loop Transformation Algorithm for Communication Overlapping

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Practical and Aggressive Loop Fission Technique

Locality-Based Optimizations in the Chapel Compiler

An Analytical Model for Loop Tiling Transformation

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now