Skip to main content
Log in

Abstract

There are many algorithms for the space-time mapping of nested loops. Some of them even make the optimal choices within their framework. We propose a preprocessing phase for algorithms in the polytope model, which extends the model and yields space-time mappings whose schedule is, in some cases, orders of magnitude faster. These are cases in which the dependence graph has small irregularities. The basic idea is to split the index set of the loop nests into parts with a regular dependence structure and apply the existing space-time mapping algorithms to these parts individually. This work is based on a seminal idea in the more limited context of loop parallelization at the code level. We elevate the idea to the model level (our model is the polytope model), which increases its applicability by providing a clearer and wider range of choices at an acceptable analysis cost. Index set splitting is one facet in the effort to extend the power of the polytope model and to enable the generation of competitive target code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. P. Feautrier, Automatic parallelization in the polytope model, The Data Parallel Programming Model, G.-R. Perrin and A. Darte (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1132, pp. 79–103 (1996).

  2. C. Lengauer, Loop parallelization in the polytope model, CONCUR'93, E. Best (ed.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 715, pp. 398–416 (1993).

  3. A. Darte, L. Khachiyan, and Y. Robert, Linear scheduling is nearly optimal, Parallel Processing Letters 1(2):73–81 (December 1991).

    Google Scholar 

  4. P. Quinton, The systematic design of systolic arrays, Automata Networks in Computer Science, F. F. Soulié, Y. Robert, and M. Tchuenté (eds.), Chap. 9, Manchester University Press, pp. 229–260 (1987). [Also: Technical Reports 193 and 216, IRISA (INRIA-Rennes), 1983].

  5. A. Darte and F. Vivien, On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops, Euro-Par'96: Parallel Processing, Vol. I, L. Bouge, P. Fraigniaud, A. Mignotte, and Y. Robert (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1123, pp. 379–388 (1996).

  6. C. Ancourt and F. Irigoin, Scanning polyhedra with DO loops, Proc. Third ACM SIGPLAN Symp. Principles Practice of Parallel Programming (PPoPP'91), ACM Press, pp. 39–50 (1991).

  7. J. Ferrante, W. Giloi, S. Rajopadhye, and L. Thiele (eds.), Tiling for optimal resource utilization. Technical Report 221, Schloß Dagstuhl (August 1998).

  8. R. Andonov, S. Rajopadhye, and N. Yanev, Optimal orthogonal tiling, Euro-Par'98: Parallel Processing, D. Pritchard and J. Reeve (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1470, pp. 480–490 (1998).

  9. R. Barua, D. Kranz, and A. Agarwal, Communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors, Languages and Compilers for Parallel Computing (LCPC'96), D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1239, pp. 350–368 (1997).

  10. P. Feautrier, Some efficient solutions to the affine scheduling problem. Part I. One-dimen-sional time, IJPP 21(5):313–348 (1992).

    Google Scholar 

  11. P. Feautrier, Some efficient solutions to the affine scheduling problem. Part II. Multi-dimensional time, IJPP 21(6):389–420 (1992).

    Google Scholar 

  12. M. Wolfe, Optimizing supercompilers for supercomputers, Research Monographs in Parallel and Distributed Computing, MIT Press (1989).

  13. J. R. Allen and K. Kennedy, Automatic translation of FORTRAN programs to vector form, ACM Trans. Progr. Lang. Syst. 9(4):491-542 (October 1997).

    Google Scholar 

  14. U. Banerjee, Speedup of ordinary programs, Ph.D. thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Report 79–989 (October 1979).

  15. Z. Mahjoub and M. Jemni, Restructuring and parallelizing a static conditional loop, Parallel Computing 21(2):339–347 (February 1995).

    Google Scholar 

  16. Z. Mahjoub and M. Jemni, On the parallelization of single dynamic conditional loops, Simulation Practice and Theory 4:141–154 (1996).

    Google Scholar 

  17. P. Feautrier, Dataflow analysis of array and scalar references, IJPP 20(1):23–53 (February 1991).

    Google Scholar 

  18. W. Pugh and D. Wonnacott, Static analysis of upper and lower bounds on dependences and parallelism, ACM Trans. Progr. Lang. Syst. 16(4):1248–1278 (July 1994).

    Google Scholar 

  19. R. W. Floyd and R. Beigel, The Language of Machines-An Introduction to Compatibility and Formal Languages, Chap. 4.4, Computer Science Press (1994).

  20. W. Pugh and D. Wonnacott, Eliminating false data dependences using the Omega test, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation (PLDI'92), ACM SIGPLAN Notices 27(7):140–151 (July 1992).

    Google Scholar 

  21. A. Darte and F. Vivien, Automatic parallelization based on multi-dimensional scheduling. Technical Report 94-24, Laboratoire de l'Informatique du Parallélisme, Ecole Normale Supérieure de Lyon (September 1994).

  22. A. Darte and F. Vivien, Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Technical Report 96–06, Laboratoire de l'Infor-matique du Parallélisme, Ecole Normale Supérieure de Lyon (April 1996).

  23. M. Griebl and C. Lengauer, The loop parallelizer LooPo-Announcement, Languages and Compilers for Parallel Computing (LCPC'96), D. Sehr, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua (eds.), Springer-Verlag, Lecture Notes in Computer Science, Vol. 1239, pp. 603–604 (1997).

  24. J.-F. Collard and M. Griebl, A precise fixpoint reaching definition analysis for arrays, Languages and Compilers for Parallel Computing (LCPC'99), J. Ferrante (ed.), Springer-Verlag, Lecture Notes in Computer Science (to appear).

  25. D. K. Arvind, K. Ebcioglu, C. Lengauer, and R. S. Schreiber, (eds.), Instruction-Level Parallelism and Parallelizing Compilation, Schloß Dagstuhl, Report 237 (1999).

  26. G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, Oxford Science Publications, Fifth ed., Oxford University Press (1990).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Griebl, M., Feautrier, P. & Lengauer, C. Index Set Splitting. International Journal of Parallel Programming 28, 607–631 (2000). https://doi.org/10.1023/A:1007516818651

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007516818651

Navigation