Skip to main content
Log in

Generation of parallel synchronization-free tiled code

  • Published:
Computing Aims and scope Submit manuscript

Abstract

A novel approach to generation of parallel synchronization-free tiled code for the loop nest is presented. It is derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. It uses the transitive closure of loop nest dependence graphs to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target (corrected) tiles. Then parallel synchronization-free tiled code is generated on the basis of valid (corrected) tiles applying the transitive closure of dependence graphs. The main contribution of the paper is demonstrating that the presented technique is able to generate parallel synchronization-free tiled code, provided that the exact transitive closure of a dependence graph can be calculated and there exist synchronization-free slices on the statement instance level in the loop nest. We show that the presented approach extracts such a parallelism when well-known techniques fail to extract it. Enlarging the scope of loop nests, for which synchronization-free tiled code can be generated, is achieved by means of applying the intersection of extracted slices and generated valid tiles, in contrast to forming slices of valid tiles as suggested in previously published techniques based on the transitive closure of a dependence graph. The presented approach is implemented in the publicly available TC optimizing compiler. Results of experiments demonstrating the effectiveness of the approach and the efficiency of parallel programs generated by means of it are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. If a slice has multiple sources, then although all its sources belong to \({ UDS}\), only the lexicographically minimal source is the representative of a slice.

  2. http://tc-optimizer.sourceforge.net.

References

  1. Bastoul C (2004) Code generation in the polyhedral model is easier than you think. In: PACT’13 IEEE international conference on parallel architecture and compilation techniques, Juan-les-Pins, pp 7–16

  2. Bandishti V, Pananilath I, Bondhugula U (2012) Tiling stencil computations to maximize parallelism. In: SC ’12: Proceedings of the international conference on high performance computing, networking, storage and analysis, IEEE, Salt Lake City, Utah, pp 1–11

  3. Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Comput 37:479–497

    Article  Google Scholar 

  4. Bielecki W et al (2014) Using basis dependence distance vectors to calculate the transitive closure of dependence relations by means of the Floyd-Warshall algorithm. J Comb Optim. doi:10.1007/s10878-014-9740-2

    MATH  Google Scholar 

  5. Bielecki W, Palkowski M (2015) Perfectly nested loop tiling transformations based on the transitive closure of the program dependence graph. In: Wilinski A, Fray IE, Pejas J (eds) Soft computing in computer and information science, advances in intelligent systems and computing. Springer, Berlin, pp 309–320

    Google Scholar 

  6. Bielecki W, Palkowski M (2016) Tiling arbitrarily nested loops by means of the transitive closure of dependence graphs. Appl Math Comput Sci 26(4):919–939

    MathSciNet  MATH  Google Scholar 

  7. Bielecki W, Palkowski M, Klimek T (2012) Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput 38(9):518–532

    Article  Google Scholar 

  8. Bielecki W, Palkowski M, Klimek T (2015) Free scheduling of tiles based on the transitive closure of dependence graphs. In: Wyrzykowski R (ed) 11th international conference on parallel processing and applied mathematics, Part II, Lecture notes in computer science, vol 9574. Springer, Berlin, Heidelberg, pp 133–142

  9. Bondhugula U, Baskaran M, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Compiler constructure. Lecture notes in computer science. Springer, pp 132–146

  10. Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (2008) A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not 43(6):101–113

    Article  Google Scholar 

  11. Feautrier P (1992) Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int J Parallel Program 21(5):313–348

    Article  MathSciNet  MATH  Google Scholar 

  12. Feautrier P (1992) Some efficient solutions to the affine scheduling problem: II. multidimensional time. Int J Parallel Program 21(6):389–420

    Article  MathSciNet  MATH  Google Scholar 

  13. Griebl M (2004) Automatic parallelization of loop programs for distributed memory architectures. University of Passau, Habilitation thesis

  14. Grosser T, Verdoolaege S, Cohen A (2015) Polyhedral ast generation is more than scanning polyhedra. ACM Trans Program Lang Syst 37(4):12:1–12:50

    Article  Google Scholar 

  15. Grosser T, Verdoolaege S, Cohen A, Sadayappan P (2014) The relation between diamond tiling and hexagonal tiling. Parallel Proces Lett 24(03):1441,002

    Article  MathSciNet  MATH  Google Scholar 

  16. Hartono A, Baskaran MM, Bastoul C, Cohen A, Krishnamoorthy S, Norris B, Ramanujam J, Sadayappan P (2009) Parametric multi-level tiling of imperfectly nested loops. In: Proceedings of the 23rd international conference on Supercomputing, ICS ’09. ACM, New York, NY, USA, pp 147–157. doi:10.1145/1542275.1542301

  17. Irigoin F, Triolet R (1988) Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’88. ACM, New York, NY, USA, pp 319–329

  18. Kelly W, Pugh W, Rosser E, Shpeisman T (1996) Transitive closure of infinite graphs and its applications. Int J Parallel Program 24(6):579–598

    Article  Google Scholar 

  19. Kim D, Rajopadhye S (2009) Efficient tiled loop generation: D-tiling. In: International workshop on languages and compilers for parallel computing. Springer, pp 293–307

  20. Krishnamoorthy S, Baskaran MM, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P (2007) Effective automatic parallelization of stencil computations. In: PLDI, pp 235–244

  21. Lim A, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: In Proceedings of the 13th ACM SIGARCH international conference on supercomputing. ACM Press, pp 228–237

  22. Lim AW, Lam MS (1994) Communication-free parallelization via affine transformations. In: 24 th ACM symposium on principles of programming languages. Springer, pp 92–106

  23. NAS benchmarks suite. http://www.nas.nasa.gov (2015)

  24. OpenMP Architecture Review Board (2012) OpenMP application program interface version 4.0

  25. Palkowski M, Klimek T, Bielecki W (2015) Traco: An automatic loop nest parallelizer for numerical applications. In: 2015 Federated conference on computer science and information systems (FedCSIS). IEEE, pp 681–686

  26. Pouchet LN (2015) The polyhedral benchmark suite/c4.1, http://web.cse.ohio-state.edu/~pouchet/software/polybench

  27. Pugh W, Rosser E (1997) Iteration space slicing and its application to communication optimization. In: International conference on supercomputing, pp 221–228

  28. Pugh W, Rosser E (1999) Iteration space slicing for locality. In: LCPC, Lecture notes in computer science, vol 1863. Springer, pp 164–184

  29. Ramanujam J, Sadayappan P (1992) Tiling multidimensional iteration spaces for multicomputers. J Parallel Distrib Comput 16(2):108–120

    Article  Google Scholar 

  30. Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software—ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299–302

  31. Verdoolaege S (2011) Counting affine calculator and applications. In: First international workshop on polyhedral compilation techniques (IMPACT’11), Charmonix, France

  32. Verdoolaege S (2016) Presburger formulas and polyhedral compilation, v0.02. Polly Labs and KU Leuven

  33. Verdoolaege S, Cohen A, Beletska A (2011) Transitive closures of affine integer tuple relations and their overapproximations. In: Proceedings of the 18th international conference on Static analysis, SAS’11. Springer, pp 216–232

  34. Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: In Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France

  35. Wolf ME, Lam MS (1991) A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pp. 30–44

  36. Xue J (2000) Loop tiling for parallelism. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Skotnicki.

Appendix A: Tile correction for arbitrarily nested parametric affine loops

Appendix A: Tile correction for arbitrarily nested parametric affine loops

figure d

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bielecki, W., Palkowski, M. & Skotnicki, P. Generation of parallel synchronization-free tiled code. Computing 100, 277–302 (2018). https://doi.org/10.1007/s00607-017-0576-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-017-0576-3

Keywords

Mathematics Subject Classification

Navigation