skip to main content
research-article
Open access

Polyhedral AST Generation Is More Than Scanning Polyhedra

Published: 15 July 2015 Publication History

Abstract

Abstract mathematical representations such as integer polyhedra have been shown to be useful to precisely analyze computational kernels and to express complex loop transformations. Such transformations rely on abstract syntax tree (AST) generators to convert the mathematical representation back to an imperative program. Such generic AST generators avoid the need to resort to transformation-specific code generators, which may be very costly or technically difficult to develop as transformations become more complex. Existing AST generators have proven their effectiveness, but they hit limitations in more complex scenarios. Specifically, (1) they do not support or may fail to generate control flow for complex transformations using piecewise schedules or mappings involving modulo arithmetic; (2) they offer limited support for the specialization of the generated code exposing compact, straightline, vectorizable kernels with high arithmetic intensity necessary to exploit the peak performance of modern hardware; (3) they offer no support for memory layout transformations; and (4) they provide insufficient control over the AST generation strategy, preventing their application to complex domain-specific optimizations.
We present a new AST generation approach that extends classical polyhedral scanning to the full generality of Presburger arithmetic, including existentially quantified variables and piecewise schedules, and introduce new optimizations for the detection of components and shifted strides. Not limiting ourselves to control flow generation, we expose functionality to generate AST expressions from arbitrary piecewise quasi-affine expressions, which enables the use of our AST generator for data-layout transformations. We complement this with support for specialization by polyhedral unrolling, user-directed versioning, and specialization of AST expressions according to the location at which they are generated, and we complete this work with fine-grained user control over the AST generation strategies used. Using this generalized idea of AST generation, we present how to implement complex domain-specific transformations without the need to write specialized code generators, but instead relying on a generic AST generator parametrized to a specific problem domain.

References

[1]
Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loop. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’91). 39--50.
[2]
Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, 40. http://www.csa.iisc.ernet.in/∼uday/publications/stencils_sc12.pdf.
[3]
Cédric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT’04). IEEE, Los Alamitos, CA, 7--16.
[4]
Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, and Nicolas Vasilache. 2014. Tiling and optimizing time-iterated computations on periodic domains. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 39--50.
[5]
Uday Bondhugula, Albert Hartono, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2008. A practical automatic polyhedral parallelization and locality optimization system. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM, New York, NY, 101--113.
[6]
Chun Chen. 2012. Polyhedra scanning revisited. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 499--508. http://ctop.cs.utah.edu/downloads/pldi128-chen.pdf.
[7]
Chun Chen, Jacqueline Chame, and Mary Hall. 2008. A Framework for Composing High-Level Loop Transformations. Technical Report 08-897. University of Southern California. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.8396&rep==rep1&type==pdf.
[8]
Alain Darte, Yves Robert, and Frédéric Vivien. 2001. Loop parallelization algorithms. In Compiler Optimizations for Scalable Parallel Systems, S. Pande D. P. Agrawal (Eds.). Springer-Verlag, New York, NY, 141--171. http://graal.ens-lyon.fr/∼fvivien/Publications/Chapter-LNCS.pdf.
[9]
Paul Feautrier. 1988. Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3, 243--268.
[10]
Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6, 389--420.
[11]
Paul Feautrier and Christian Lengauer. 2011. The polyhedron model. In Encyclopedia of Parallel Computing, D. Padua (Ed.). Springer, 1581--1592.
[12]
Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3, 261--317.
[13]
Georgios Goumas, Maria Athanasaki, and Nectarios Koziris. 2003. An efficient code generation technique for tiled iteration spaces. IEEE Transactions on Parallel and Distributed Systems 14, 10, 1021--1034. http://ftp.cslab.ece.ntua.gr/∼goumas/downloads/tpds2003.pdf.
[14]
Martin Griebl, Paul Feautrier, and Christian Lengauer. 2000. Index set splitting. International Journal of Parallel Programming 28, 6, 607--631. http://www.infosun.fim.uni-passau.de/cl/publications/docs/GFL00ijpp.pdf.
[15]
Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal/classical tiling for GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, 66:66--66:75. http://hal.inria.fr/hal-00911177
[16]
Tobias Grosser, Armin Größlinger, and Christian Lengauer. 2012. Polly—performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters 22, 4, 28. http://www.worldscientific.com/doi/abs/10.1142/S0129626412500107
[17]
Tobias Grosser, Louis-Noël Pouchet, Jagannathan Ramanujam, Ponnuswamy Sadayappan, and Sebastian Pop. 2015. Optimistic delinearization of parametrically sized arrays. In Proceedings of the 29th International Conference on Supercomputing (ICS’15).
[18]
Albert Hartono, Muthu Manikandan Baskaran, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2010. DynTile: Parametric tiled loop generation for parallel execution on multicore processors. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS’10). IEEE, Los Alamitos, CA, 1--12.
[19]
Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In Proceedings of the 27th International ACM Conference on Supercomputing (ICS’13). ACM, New York, NY, 13--24. http://www.cs.ucla.edu/∼pouchet/doc/ics-article.13.pdf.
[20]
Justin Holewinski, Louis-Noël Pouchet, and Ponnuswamy Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS’12). ACM, New York, NY, 311--320. http://www.cse.ohio-state.edu/∼pouchet/doc/ics-article.12.pdf.
[21]
ISO. 1999. ISO/IEC 9899:1999: Programming Languages C. International Organization for Standardization.
[22]
Marta Jiménez, José M. Llabería, and Agustín Fernández. 2002. Register tiling in nonrectangular iteration spaces. ACM Transactions on Programming Languages and Systems 24, 4, 409--453.
[23]
Wayne Kelly and William Pugh. 1995. A unifying framework for iteration reordering transformations. In Proceedings of the IEEE 1st International Conference on Algorithms and Architectures for Parallel Processing (ICAPP’95), Vol. 1. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.1382&rep==rep1&type==ps
[24]
William Kelly, William Pugh, and Evan Rosser. 1995. Code generation for multiple mappings. In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers’95). IEEE, Los Alamitos, CA, 332--341.
[25]
DaeGon Kim, Lakshminarayanan Renganarayanan, Dave Rostron, Sanjay Rajopadhye, and Michelle Mills Strout. 2007. Multi-level tiling: M for the price of one. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, NY, Article No. 51.
[26]
Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-Noël Pouchet, and Ponnuswamy Sadayappan. 2013. When polyhedral transformations meet SIMD code generation. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 127--138. http://users.ece.cmu.edu/∼franzf/papers/pldi13.pdf.
[27]
Vincent Loechner and Doran K. Wilde. 1997. Parameterized polyhedra and their vertices. International Journal of Parallel Programming 25, 6, 525--549. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.2146&rep==rep1&type==pdf.
[28]
Louis-Noël Pouchet. 2012. PolyBench/C 3.2. Retrieved June 8, 2015, from http://www.cs.ucla.edu/∼pouchet/software/polybench/.
[29]
Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and John Cavazos. 2008. Iterative optimization in the polyhedral model: Part II, multidimensional time. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08). ACM, New York, NY, 90--100. http://www.cse.ohio-state.edu/∼pouchet/doc/pldi-article.08.pdf.
[30]
Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and Nicolas Vasilache. 2007. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In Proceedings of the IEEE/ACM 5th International Symposium on Code Generation and Optimization (CGO’07). IEEE, Los Alamitos, CA, 144--156. http://www.cse.ohio-state.edu/∼pouchet/doc/cgo-article.07.pdf.
[31]
William Pugh and Evan Rosser. 1997. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing (ICS’97). ACM, New York, NY, 221--228.
[32]
William Pugh and David Wonnacott. 1994. Static analysis of upper and lower bounds on dependences and parallelism. Transactions on Programming Languages and Systems 16, 4, 1248--1278. http://drum.lib.umd.edu/bitstream/1903/629/4/CS-TR-3250.pdf.
[33]
Fabien Quilleré, Sanjay Rajopadhye, and Doran Wilde. 2000. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28, 5, 469--498.
[34]
Lakshminarayanan Renganarayanan, DaeGon Kim, Sanjay Rajopadhye, and Michelle Mills Strout. 2007. Parameterized tiled loops for free. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, NY, 405--414.
[35]
Jun Shirako, Louis-Noël Pouchet, and Vivek Sarkar. 2014. Oil and water can mix: An integration of polyhedral and AST-based transformations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’14). IEEE, Los Alamitos, CA, 287--298.
[36]
Nicolas Vasilache, Cédric Bastoul, and Albert Cohen. 2006. Polyhedral code generation in the real world. In Compiler Construction. Lecture Notes in Computer Science, Vol. 3923. Springer, 185--201. http://icps.u-strasbg.fr/∼bastoul/research/papers/VBC06-CC.pdf.
[37]
Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Strout. 2014. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). ACM, New York, NY, 185:185--185:194.
[38]
Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software—ICMS 2010. Lecture Notes in Computer Science, Vol. 6327. Springer, 299--302.
[39]
Sven Verdoolaege. 2011. Counting affine calculator and applications. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT’11).
[40]
Sven Verdoolaege. 2015. Integer set coalescing. In Proceedings of the 5th International Workshop on Polyhedral Compilation Techniques (IMPACT’15).
[41]
Sven Verdoolaege and Tobias Grosser. 2012. Polyhedral extraction tool. In Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT’12). http://impact.gforge.inria.fr/impact2012/workshop_IMPACT/verdoolaege.pdf.
[42]
Sven Verdoolaege, Serge Guelton, Tobias Grosser, and Albert Cohen. 2014. Schedule trees. In Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques. http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf.
[43]
Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2012. Equivalence checking of static affine programs using widening to handle recurrences. ACM Transactions on Programming Languages and Systems 34, 3, Article No. 11.
[44]
Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization 9, 4, 54:1--54:23.
[45]
David Wonnacott. 2002. Achieving scalable locality with time skewing. International Journal of Parallel Programming 30, 3, 181--221.
[46]
Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye. 2012. AlphaZ: A system for design space exploration in the polyhedral model. In Proceedings of the 25th International Workshop on Languages and Compilers for Parallel Computing. http://people.rennes.inria.fr/Tomofumi.Yuki/papers/yuki-lcpc2012.pdf.
[47]
Wei Zuo, Peng Li, Deming Chen, Louis-Noël Pouchet, Shunan Zhong, and Jason Cong. 2013. Improving polyhedral code generation for high-level synthesis. In Proceedings of the 9th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). IEEE, Los Alamitos, CA, 15:1--15:10.

Cited By

View all
  • (2025)A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex ApplicationsProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708951(418-430)Online publication date: 1-Mar-2025
  • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/366564346:3(1-74)Online publication date: 10-Oct-2024
  • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
  • Show More Cited By

Index Terms

  1. Polyhedral AST Generation Is More Than Scanning Polyhedra

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Programming Languages and Systems
    ACM Transactions on Programming Languages and Systems  Volume 37, Issue 4
    August 2015
    204 pages
    ISSN:0164-0925
    EISSN:1558-4593
    DOI:10.1145/2807424
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2015
    Accepted: 01 March 2015
    Revised: 01 December 2014
    Received: 01 September 2014
    Published in TOPLAS Volume 37, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Polyhedral compilation
    2. Presburger relations
    3. code generation
    4. index set splitting
    5. unrolling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • ARTEMIS project
    • Swissuniversities through the Platform for Advanced Computing Initiative (PASC)
    • LIACS from Intel Corporation
    • Google Europe Fellowship in Efficient Computing
    • European FP7 project

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)260
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex ApplicationsProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3696443.3708951(418-430)Online publication date: 1-Mar-2025
    • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/366564346:3(1-74)Online publication date: 10-Oct-2024
    • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
    • (2024)A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638484(55-67)Online publication date: 2-Mar-2024
    • (2024)TSCompiler: efficient compilation framework for dynamic-shape modelsScience China Information Sciences10.1007/s11432-024-4071-667:10Online publication date: 13-Sep-2024
    • (2023)Autotuning Convolutions Is Easier Than You ThinkACM Transactions on Architecture and Code Optimization10.1145/357064120:2(1-24)Online publication date: 1-Mar-2023
    • (2022)Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iterationACM Transactions on Architecture and Code Optimization10.1145/356605420:1(1-26)Online publication date: 16-Dec-2022
    • (2022)Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions AtomicallyProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569656(451-466)Online publication date: 8-Oct-2022
    • (2022)Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway ProcessorProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545031(1-12)Online publication date: 29-Aug-2022
    • (2022)End-to-end translation validation for the halide languageProceedings of the ACM on Programming Languages10.1145/35273286:OOPSLA1(1-30)Online publication date: 29-Apr-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media