skip to main content
10.1145/1148109.1148117acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

A general approach for partitioning N-dimensional parallel nested loops with conditionals

Published:30 July 2006Publication History

ABSTRACT

Parallel loops account for the greatest amount of parallelism in scientific and numerical codes. For example, most of the DO loops in SPEC CFP2000 and SPEC OMPM2001 are of DOALL type and account for a large percentage of the total execution time. One of the ways to exploit parallelism is to partition the iteration space of a DOALL loop amongst different processors in a parallel processor system. Naturally, a good partitioning is of key importance to achieve high performance and for efficient use of multiprocessor systems. Although a significant amount of work has been done in partitioning and scheduling of loops with both rectangular and non-rectangular iteration spaces, the problem of partitioning loops with conditionals has not been addressed so far to the best of our knowledge. In this paper, we present a mathematical model for partitioning parallel nested loops, both perfect and non-perfect, with conditionals, where the expressions in a conditional are affine functions of the outer loop indices. We present a loop transformation based on elimination of redundant constraints bounding the iteration space of a nested loop. The transformation plays a critical role during the (static) partitioning process as it helps to capture the "exact" lower and upper bounds (can be either a constant or symbolic) of the loop indices. We generate a canonical form of the loop nest using the transformation and employ the geometric approach we proposed earlier (in [1, 2]) for partitioning the iteration space along an axis corresponding to the outermost loop. For cases in which such a transformation does not exist, we propose a general approach for loop canonicalization. We present several examples from the literature and numerical packages to illustrate the effectiveness of our approach.

References

  1. A. Kejariwal, A. Nicolau, U. Banerjee, and C. D. Polychronopoulos. A novel approach for partitioning iteration spaces with variable densities. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 120--131, Chicago, IL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Kejariwal, P. D'Alberto, A. Nicolau, and C. D. Polychronopoulos. A geometric approach for partitioning N-dimensional non-rectangular iteration spaces. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, pages 102--116, West Lafayette, IN, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. R. Haghighat and Constantine D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM Transactions on Programming Languages and Systems, 18(4):477--518, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Sakellariou. On the Quest for Perfect Load Balance in Loop-Based Parallel Computations. PhD thesis, Department of Computer Science, University of Manchester, October 1996.Google ScholarGoogle Scholar
  5. C. Polychronopoulos, D. J. Kuck, and D. A. Padua. Execution of parallel loops on parallel processor systems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 519--527, August 1986.Google ScholarGoogle Scholar
  6. E. H. D'Hollander. Partitioning and labeling of loops by unimodular transformations. IEEE Transactions on Parallel and Distributed Systems, 3(4):465--476, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Lundstrom and G. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, August 1980.Google ScholarGoogle Scholar
  8. C. Polychronopoulos. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the 1987 International Conference on Parallel Processing, pages 235--242, August 1987.Google ScholarGoogle Scholar
  9. J. Foley, A. van Dam, S. Feiner, and J. Hughes. Computer Graphics: Principles and Practice. Addison-Wesley, 2nd edition in C edition, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Anninos. Computational cosmology: From the early universe to the large scale structure. In Living Reviews in Relativity 4, 2001.Google ScholarGoogle Scholar
  11. M. O'Boyle and G. A. Hedayat. Load balancing of parallel affine loops by unimodular transformations. Technical Report UMCS-92-1-1, Department of Computer Science, University of Manchester, January 1992.Google ScholarGoogle Scholar
  12. R. Blumofe, C. Joerg, B. Kuszmaul, C. Leiserson, K. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th Symposium on Principles and Practice of Parallel Programming, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Saito, N. Stavrakos, and C. D. Polychronopoulos. Multithreading runtime support for loop and functional parallelism. In Proceedings of the Second International Symposium High Performance Computing, pages 133--144, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. B. J. Fourier. Solution d'une question particulière du calcul des inégalités. In Ouevres II, pages 317--328. 1826.Google ScholarGoogle Scholar
  15. L. L. Dines. System of linear inequalities. Annals of Mathematics, 20:191--199, 1919.Google ScholarGoogle ScholarCross RefCross Ref
  16. L. L. Dines and N. H. McCoy. On linear inequalities. Transactions of the Royal Society of Canada, 27:217--232, 1933.Google ScholarGoogle Scholar
  17. T. S. Motzkin. Beitrage zur theorie der linearen Ungleichungen. PhD thesis, University of Basel, 1936.Google ScholarGoogle Scholar
  18. H. W. Kuhn. Solvability and consistency of linear inequalities and inequalities. American Mathematical Monthly, 63:217--232, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. N. Chernikov. The solution of linear programming problems by elimination of unknowns. Doklady Akademii Nauk SSSR, 139:1314--1317, 1961.Google ScholarGoogle Scholar
  20. G. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, NJ, 1963.Google ScholarGoogle Scholar
  21. G. B. Dantzig and B. C. Eaves. Fourier-Motzkin elimination and its dual. Journal of Combinatorial Theory (A), 14(3):288--297, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  22. R. J. Duffin. On Fourier's analysis of linear inequality systems. In Mathematical Programming Study 1, pages 71--95. North-Holland, 1974.Google ScholarGoogle Scholar
  23. H. P. Williams. Fourier-motzkin elimination extension to integer programming problems. Journal of Combinatorial Theory (A), 21(1):118--123, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  24. U. Banerjee. Loop Transformation for Restructuring Compilers. Kluwer Academic Publishers, Boston, MA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Bik and H. Wijshoff. Implementation of Fourier-Motzkin elimination. Technical Report TR94-42, Department of Computer Science, University of Leiden, The Netherlands, 1994.Google ScholarGoogle Scholar
  26. W. Pugh. A practical algorithm for exact array dependence analysis. Communications of the ACM, 35(8):102--114, August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. LEDAS Geometric Solver. http://lgs.ledas.com/features.php.Google ScholarGoogle Scholar
  28. D. Kuck, A. H. Sameh, R. Cytron, A. Veidenbaum, C. D. Polychronopoulos, G. Lee, T. McDaniel, B. R. Leasure, C. Beckman, J. R. B Davies, and C. P. Kruskal. The effects of program restructuring, algorithm change and architecture choice on program performance. In Proceedings of the 1984 International Conference on Parallel Processing, pages 129--138, August 1984.Google ScholarGoogle Scholar
  29. M. J. Wolfe. Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge, MA, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184--1201, December 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. P. Kruskal and A. Weiss. Allocating independent subtasks on parallel processors. IEEE Transactions on Software Engineering, 11(10):1001--1016, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Haghighat and C. Polychronopoulos. Symbolic program analysis and optimization for parallelizing compilers. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. J. Wolfe and C. W. Tseng. The Power test for data dependence. IEEE Transactions on Parallel and Distributed Systems, 3(5):591--601, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. U. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39--50, Williamsburg, VA, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Triolet. Interprocedural analysis for program restructuring with Parafrase. CSRD Rpt. No. 538, Department of Computer Science, University of Illinois at Urbana-Champaign, December 1985.Google ScholarGoogle Scholar
  38. D. Maydan, J. Hennessy, and M. Lam. Efficient and exact data dependence analysis. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: An overview of the PIPS project. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. Pugh. Counting solutions to presburger formulas: How and why. ACM SIGPLAN Notices, 29(6):121--134, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A general approach for partitioning N-dimensional parallel nested loops with conditionals

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '06: Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
        July 2006
        344 pages
        ISBN:1595934529
        DOI:10.1145/1148109

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 July 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader