ABSTRACT
Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful yet costly integer programming algorithms to compose loop optimizations. As a result, the benefits that a polyhedral compiler has to offer to programs such as real scientific applications that contain sequences of loop nests, remain impractical for the common users. In this work, we address this scalability problem in polyhedral compilers. We identify three causes of unscalability, each of which stems from large number of statements and dependences in the program scope. We propose a one-shot solution to the problem by reducing the effective number of statements and dependences as seen by the compiler. We achieve this by representing a sequence of statements in a program by a single super-statement. This set of super-statements exposes the minimum sufficient constraints to the Integer Linear Programming (ILP) solver for finding correct optimizations. We implement our approach in the PLuTo polyhedral compiler and find that it condenses the program statements and program dependences by factors of 4.7x and 6.4x, respectively, averaged over 9 hot regions (ranging from 48 to 121 statements) in 5 real applications. As a result, the improvements in time and memory requirement for compilation are 268x and 20x, respectively, over the latest version of the PLuTo compiler. The final compile times are comparable to the Intel compiler while the performance is 1.92x better on average due to the latter’s conservative approach to loop optimization.
- R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9:491–542, 1987. Google ScholarDigital Library
- U. K. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Norwell, MA, USA, 1988. Google ScholarDigital Library
- S. G. Bhaskaracharya and U. Bondhugula. Polyglot: a polyhedral loop transformation framework for a graphical dataflow language. In Compiler Construction, pages 123–143. Springer, 2013. Google ScholarDigital Library
- U. Bondhugula. Pluto: An automatic parallelizer and locality optimizer for multicores, 2014. Available at http:// pluto-compiler.sourceforge.net.Google Scholar
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 101–113, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- U. Bondhugula, V. Bandishti, A. Cohen, G. Potron, and N. Vasilache. Tiling and optimizing time-iterated computations on periodic domains. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pages 39–50, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- G. B. Dantzig and B. Curtis Eaves. Fourier-motzkin elimination and its dual. Journal of Combinatorial Theory, Series A, 14(3):288–297, 1973.Google ScholarCross Ref
- C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. JPDC, 64(1):108 – 134, 2004. Google ScholarDigital Library
- P. Feautrier. Parametric integer programming. RAIRO Recherche Op’erationnelle, 22, 1988.Google Scholar
- P. Feautrier. Scalable and structured scheduling. International Journal of Parallel Programming, 34(5):459–487, 2006. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319–349, 1987. Google ScholarDigital Library
- G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 15–29, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- T. Grosser, A. Groesslinger, and C. Lengauer. Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04), 2012.Google ScholarCross Ref
- N. P. Johnson, T. Oh, A. Zaks, and D. I. August. Fast condensation of the program dependence graph. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 39–50, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC, volume 768 of Lecture Notes in Computer Science, pages 301–320. 1994. Google ScholarDigital Library
- L. Lamport. The parallel execution of do loops. Commun. ACM, 17 (2):83–93, Feb. 1974. Google ScholarDigital Library
- D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 1–14, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In SPAA, pages 282–291. ACM, 1997. Google ScholarDigital Library
- S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 233–246, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- S. P. Midkiff. Automatic parallelization: An overview of fundamental compiler techniques. Synthesis Lectures on Computer Architecture, 7 (1):1–169, 2012. Google ScholarDigital Library
- J. Ng, D. Kulkarni, W. Li, R. Cox, and S. Bobholz. Inter-procedural loop fusion, array contraction and rotation. In Parallel Architectures and Compilation Techniques, 2003. PACT 2003. Proceedings. 12th International Conference on, pages 114–124, 2003. Google ScholarDigital Library
- L.-N. Pouchet and M. Narayan. Polyopt: a polyhedral optimizer for the rose compiler, 2014. Available at http://www.cse. ohio-state.edu/˜pouchet/software/polyopt/.Google Scholar
- L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedralbased data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 4–13. ACM, 1991. Google ScholarDigital Library
- A. J. Thadhani. Factors affecting programmer productivity during application development. IBM Systems Journal, 23(1):19–35, 1984. Google ScholarDigital Library
- R. Upadrasta and A. Cohen. Sub-polyhedral scheduling using (unit- )two-variable-per-inequality polyhedra. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, pages 483–496, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- N. Vasilache, C. Bastoul, A. Cohen, and S. Girbal. Violated dependence analysis. In Proceedings of the 20th annual international conference on Supercomputing, pages 335–344. ACM, 2006. Google ScholarDigital Library
- A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pages 185:185–185:194, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 299–302. Springer Berlin Heidelberg, 2010. Google ScholarDigital Library
- S. Verdoolaege. Integer set coalescing. In 5th International Workshop on Polyhedral Compilation Techniques (IMPACT), 2015.Google Scholar
- M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. International Journal of Parallel Programming, 16(2):137–178, 1987. Google ScholarDigital Library
- M. Wolfe and C.-W. Tseng. The power test for data dependence. Parallel and Distributed Systems, IEEE Transactions on, 3(5):591– 601, 1992. Google ScholarDigital Library
- M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. Google ScholarDigital Library
Index Terms
- Improving compiler scalability: optimizing large programs at small price
Recommendations
The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests
Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. ...
Improving compiler scalability: optimizing large programs at small price
PLDI '15Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful ...
A polyhedral compilation framework for loops with dynamic data-dependent bounds
CC 2018: Proceedings of the 27th International Conference on Compiler ConstructionWe study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with ...
Comments