Abstract
This paper addresses the issue of parallelizing imperfectly nested loops. Current parallelizing compilers or transformations would either only parallelize the inner-most loop (which is more like vectorization than parallelization), or not parallelize the loops at all. We present an approach that transforms an imperfectly nested loop into at most three fully parallel perfectly nested loops. The transformed loops can be parallelized by any parallelizing compiler. The advantage of our technique is the simplicity of the transformed loops and low synchronization overhead. The feasibility of this approach was tested using several types of loops including those from the Eispack math library and from Linpack benchmark on different multi-processor platforms and performance was compared with Sun’s MP C and Cray’s autotasking. The results show that our method is very effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ju, J., Chaudhary, V.: Unique sets oriented parallelization of loops with non-uniform dependences. The Computer Journal 40(6), 322–339 (1997)
Shen, Z., Li, Z., Yew, P.C.: An empirical study on array subscripts and data dependencies. In: Proc. 1989 Int. Conf. on Parallel Processing, vol. II, pp. 145–152 (1989)
Tzen, T.H., Ni, L.M.: Dependence uniformization: A loop parallelization tehnique. IEEE trans. Parallel and Distrib. Syst. 4, 547–558 (1993)
Chen, D.L., Yew, P.C.: On effective execution of nonuniform doacross loops. IEEE trans. Parallel and Distrib. Syst. 7, 463–476 (1996)
Punyamurtula, S., Chaudhary, V., Ju, J., Roy, S.: Compile time partitioning of nested loop iteration spaces with non-uniform dependences. J. Parallel Algor. and App. 12, 113–141 (1996)
Zaafrani, A., Ito, M.R.: Parallel region execution of loops with irregular dependences. In: Proc. 1994 Int. Conf. Parallel Processing, vol. II, pp. 11–19 (1994)
Kelly, W., Pugh, W.: Minimizing communication while preserving parallelism. In: Int. Conf. Supercomputing (1996)
Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine transforms. In: 24th Ann. ACM SIGPLAN-SIGACT Symp. Prin. Prog. Lang., Paris (1997)
Sass, R., Mutka, M.: Enabling unimodular transformations. In: Proc. Supercomputing 1994, pp. 753–762 (1994)
Ju, J.: Automatic Parallelization of Non-Uniform Loops. Ph.D. thesis. Wayne State University (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ju, J., Chaudhary, V. (1999). A Fission Technique Enabling Parallelization of Imperfectly Nested Loops. In: Banerjee, P., Prasanna, V.K., Sinha, B.P. (eds) High Performance Computing – HiPC’99. HiPC 1999. Lecture Notes in Computer Science, vol 1745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46642-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-46642-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66907-4
Online ISBN: 978-3-540-46642-0
eBook Packages: Springer Book Archive