An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

Wu, Jan-Jan

doi:10.1023/A:1008168528240

An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

Published: February 2000

Volume 15, pages 321–339, (2000)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jan-Jan Wu¹

39 Accesses
4 Citations
Explore all metrics

Abstract

Reduction operations frequently appear in algorithms. Due to their mathematical invariance properties (assuming that round-off errorscan be tolerated), it is reasonable to ignore ordering constraints on the computation of reductions in order to take advantage of the computing power of parallel machines.

One obvious and widely-used compilation approach for reductions is syntactic pattern recognition. Either the source language includes explicit reduction operators, or certain specific loops are recognized as equivalent to known reductions. Once such patterns are recognized, hand optimized code for the reductions are incorporated in the target program. The advantage of this approach is simplicity. However, it imposes restrictions on the reduction loops—no data dependence other than that caused by the reduction operation itself is allowed in the reduction loops.

In this paper, we present a parallelizing technique, interleaving transformation, for distributed-memory parallel machines. This optimization exploits parallelism embodied in reduction loops through combination of data dependence analysis and region analysis. Data dependence analysis identifies the loop structures and the conditions that can trigger this optimization. Region analysis divides the iteration domain into a sequential region and an order-insensitive region. Parallelism is achieved by distributing the iterations in the order-insensitive region among multiple processors. We use a triangular solver as an example to illustrate the optimization. Experimental results on various distributed-memory parallel machines, including the Connection Machines CM-5, the nCUBE, the IBM SP-2, and a network of Sun Workstations are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelization of Implementations of Purely Sequential Algorithms

Article 16 December 2019

Accelerating Nested Data Parallelism: Preserving Regularity

Practical Approach to Introducing Parallelism in Sequential Programs

References

The kap (kuck and associates preprocessor) fortran optimizer. http: //squish.ucs.indiana.edu, 1998.
The MIPSpro POWER Fortran Programmers Guide. http: //www.ncsa.uiuc.edu/Pubs/NCSASysDir.html, 1998.
P. Bose. Interactive program improvement via eave. In Proceedings of the International Conference on Supercomputing, pp. 119–130, 1988.
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, and M. Wu. Compiling fortran 90/hpf for distributed memory mimd computers. Journal of Parallel and Distributed Computing, 21(1): 15–24, April 1994.
Google Scholar
M. Chen and Y. Hu. Optimizations for compiling iterative spatial loops to massively parallel machines. In Proceedings of the 5th Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, 1992.
A. L. Fisher and A. M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation, pp. 135–145, 1994.
A. M. Ghuloum and A. L. Fisher. Flattening and parallelizing irregular, recurrent loop nests. In Proc. of Principles and Practice of Parallel Programming (PPOPP'95), pp. 58–67, 1995.
M. W. Hall, M. Lam, R. Murphy, and S. P. Amarasinghe. Interprocedural analysis for parallelization: Design and experience. In SIAM Conference on Parallel Processing for Scientific Computing, 1995.
P. Jouvelot and B. Dehbonei. A unified semantic approach for the vectorization and parallelization of generalized reductions. In ACM International Conference on Supercomputing, 1989.
B. Leasure. The parafrase project's fortran analyzer. Technical Report 85-504, Dept. Computer Science, University of Illinois at Urbana-Champaign, 1985.
Google Scholar
Z. Li, P. C. Yew, and C. Q. Zhu. An efficient data dependence analysis for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems, 1(1): 26–34, January 1990.
Google Scholar
D. B. Loveman. High performance Fortran language specifications. In High Performance FORTRAN Forum, Houston, Texas, January 1995.
L.-C. Lu and M. Chen. Subdomain dependency test for massively parallelism. In Proceedings of Supercomputing'90, 1990.
Y. Paek, J. Hoeflinger, and D. Padua. Simplification of array access patterns for compiler optimizations. In ACM SIGPLAN'98 Conference on Programming Language Design and Implementation, 1998.
P. Peterson and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. IEEE Transactions on Parallel and Distributed Systems, 7(11): 1121–1132, 1996.
Google Scholar
S. S. Pinter and R. Y. Pinter. Program optimization and parallelization using idioms. In Proceedings of Principles of Programming Languages, 1990.
W. Pottenger. Induction variable substitution and reduction recognition in the polaris parallelizing compiler. Master's thesis, Dept. Computer Science, UIUC, 1995.
Google Scholar
L. Rauchneerger and D. Padua. The lrpd test: Run-time parallelization of loops with privatization and reduction parallelism. In ACM SIGNPLAN'95 Conference on Programming Languages Design and Implementation, pp. 218–232, 1995.
H. B. Ribas. Obtaining dependence vectors for nested-loop computations. In Proceedings of 1990 International Conference on Parallel Processing, pp. 212–219, 1990.
X. Roden and P. Feautrier. Detection of recurrences in sequential programs with loops. In Lecture Notes in Computer Science, vol. 694, pp. 132–145, 1993.
C.-W. Tseng. An optimizing Fortran D compilers for MIMD distributed-memory machine. Ph.D. thesis, Rice University, Dept. of Computer Science, January 1993.
M. J. Wolfe. Optimizing supercompilers for supercomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Dept. of Computer Science, 1982.
H. Zima and B. Chapman. Compiling for distributed memory systems. In Proceedings of the IEEE Special Section on Languages and Compilers for Parallel Machines, pp. 264–287, February 1993.
H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series. Addison-Wesley, 1990.

Download references

Author information

Authors and Affiliations

Institute of Information Science, Academia Sinica, Taipei Taiwan, 11529, Republic of China
Jan-Jan Wu

Authors

Jan-Jan Wu
View author publications
You can also search for this author inPubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, JJ. An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines. The Journal of Supercomputing 15, 321–339 (2000). https://doi.org/10.1023/A:1008168528240

Download citation

Issue Date: February 2000
DOI: https://doi.org/10.1023/A:1008168528240

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallelization of Implementations of Purely Sequential Algorithms

Accelerating Nested Data Parallelism: Preserving Regularity

Practical Approach to Introducing Parallelism in Sequential Programs

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now