Abstract
A novel approach to generation of parallel synchronization-free tiled code for the loop nest is presented. It is derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. It uses the transitive closure of loop nest dependence graphs to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target (corrected) tiles. Then parallel synchronization-free tiled code is generated on the basis of valid (corrected) tiles applying the transitive closure of dependence graphs. The main contribution of the paper is demonstrating that the presented technique is able to generate parallel synchronization-free tiled code, provided that the exact transitive closure of a dependence graph can be calculated and there exist synchronization-free slices on the statement instance level in the loop nest. We show that the presented approach extracts such a parallelism when well-known techniques fail to extract it. Enlarging the scope of loop nests, for which synchronization-free tiled code can be generated, is achieved by means of applying the intersection of extracted slices and generated valid tiles, in contrast to forming slices of valid tiles as suggested in previously published techniques based on the transitive closure of a dependence graph. The presented approach is implemented in the publicly available TC optimizing compiler. Results of experiments demonstrating the effectiveness of the approach and the efficiency of parallel programs generated by means of it are discussed.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
If a slice has multiple sources, then although all its sources belong to \({ UDS}\), only the lexicographically minimal source is the representative of a slice.
References
Bastoul C (2004) Code generation in the polyhedral model is easier than you think. In: PACT’13 IEEE international conference on parallel architecture and compilation techniques, Juan-les-Pins, pp 7–16
Bandishti V, Pananilath I, Bondhugula U (2012) Tiling stencil computations to maximize parallelism. In: SC ’12: Proceedings of the international conference on high performance computing, networking, storage and analysis, IEEE, Salt Lake City, Utah, pp 1–11
Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Comput 37:479–497
Bielecki W et al (2014) Using basis dependence distance vectors to calculate the transitive closure of dependence relations by means of the Floyd-Warshall algorithm. J Comb Optim. doi:10.1007/s10878-014-9740-2
Bielecki W, Palkowski M (2015) Perfectly nested loop tiling transformations based on the transitive closure of the program dependence graph. In: Wilinski A, Fray IE, Pejas J (eds) Soft computing in computer and information science, advances in intelligent systems and computing. Springer, Berlin, pp 309–320
Bielecki W, Palkowski M (2016) Tiling arbitrarily nested loops by means of the transitive closure of dependence graphs. Appl Math Comput Sci 26(4):919–939
Bielecki W, Palkowski M, Klimek T (2012) Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput 38(9):518–532
Bielecki W, Palkowski M, Klimek T (2015) Free scheduling of tiles based on the transitive closure of dependence graphs. In: Wyrzykowski R (ed) 11th international conference on parallel processing and applied mathematics, Part II, Lecture notes in computer science, vol 9574. Springer, Berlin, Heidelberg, pp 133–142
Bondhugula U, Baskaran M, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Compiler constructure. Lecture notes in computer science. Springer, pp 132–146
Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (2008) A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not 43(6):101–113
Feautrier P (1992) Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int J Parallel Program 21(5):313–348
Feautrier P (1992) Some efficient solutions to the affine scheduling problem: II. multidimensional time. Int J Parallel Program 21(6):389–420
Griebl M (2004) Automatic parallelization of loop programs for distributed memory architectures. University of Passau, Habilitation thesis
Grosser T, Verdoolaege S, Cohen A (2015) Polyhedral ast generation is more than scanning polyhedra. ACM Trans Program Lang Syst 37(4):12:1–12:50
Grosser T, Verdoolaege S, Cohen A, Sadayappan P (2014) The relation between diamond tiling and hexagonal tiling. Parallel Proces Lett 24(03):1441,002
Hartono A, Baskaran MM, Bastoul C, Cohen A, Krishnamoorthy S, Norris B, Ramanujam J, Sadayappan P (2009) Parametric multi-level tiling of imperfectly nested loops. In: Proceedings of the 23rd international conference on Supercomputing, ICS ’09. ACM, New York, NY, USA, pp 147–157. doi:10.1145/1542275.1542301
Irigoin F, Triolet R (1988) Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’88. ACM, New York, NY, USA, pp 319–329
Kelly W, Pugh W, Rosser E, Shpeisman T (1996) Transitive closure of infinite graphs and its applications. Int J Parallel Program 24(6):579–598
Kim D, Rajopadhye S (2009) Efficient tiled loop generation: D-tiling. In: International workshop on languages and compilers for parallel computing. Springer, pp 293–307
Krishnamoorthy S, Baskaran MM, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P (2007) Effective automatic parallelization of stencil computations. In: PLDI, pp 235–244
Lim A, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: In Proceedings of the 13th ACM SIGARCH international conference on supercomputing. ACM Press, pp 228–237
Lim AW, Lam MS (1994) Communication-free parallelization via affine transformations. In: 24 th ACM symposium on principles of programming languages. Springer, pp 92–106
NAS benchmarks suite. http://www.nas.nasa.gov (2015)
OpenMP Architecture Review Board (2012) OpenMP application program interface version 4.0
Palkowski M, Klimek T, Bielecki W (2015) Traco: An automatic loop nest parallelizer for numerical applications. In: 2015 Federated conference on computer science and information systems (FedCSIS). IEEE, pp 681–686
Pouchet LN (2015) The polyhedral benchmark suite/c4.1, http://web.cse.ohio-state.edu/~pouchet/software/polybench
Pugh W, Rosser E (1997) Iteration space slicing and its application to communication optimization. In: International conference on supercomputing, pp 221–228
Pugh W, Rosser E (1999) Iteration space slicing for locality. In: LCPC, Lecture notes in computer science, vol 1863. Springer, pp 164–184
Ramanujam J, Sadayappan P (1992) Tiling multidimensional iteration spaces for multicomputers. J Parallel Distrib Comput 16(2):108–120
Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software—ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299–302
Verdoolaege S (2011) Counting affine calculator and applications. In: First international workshop on polyhedral compilation techniques (IMPACT’11), Charmonix, France
Verdoolaege S (2016) Presburger formulas and polyhedral compilation, v0.02. Polly Labs and KU Leuven
Verdoolaege S, Cohen A, Beletska A (2011) Transitive closures of affine integer tuple relations and their overapproximations. In: Proceedings of the 18th international conference on Static analysis, SAS’11. Springer, pp 216–232
Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: In Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France
Wolf ME, Lam MS (1991) A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pp. 30–44
Xue J (2000) Loop tiling for parallelism. Kluwer Academic Publishers, Norwell
Author information
Authors and Affiliations
Corresponding author
Appendix A: Tile correction for arbitrarily nested parametric affine loops
Appendix A: Tile correction for arbitrarily nested parametric affine loops

Rights and permissions
About this article
Cite this article
Bielecki, W., Palkowski, M. & Skotnicki, P. Generation of parallel synchronization-free tiled code. Computing 100, 277–302 (2018). https://doi.org/10.1007/s00607-017-0576-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-017-0576-3
Keywords
- Synchronization-free parallelism
- Tiling
- Transitive closure
- Optimizing compiler
- Polyhedral model
- Iteration space slicing