Generation of parallel synchronization-free tiled code

Bielecki, Wlodzimierz; Palkowski, Marek; Skotnicki, Piotr

doi:10.1007/s00607-017-0576-3

Generation of parallel synchronization-free tiled code

Published: 20 October 2017

Volume 100, pages 277–302, (2018)
Cite this article

Computing Aims and scope Submit manuscript

297 Accesses
Explore all metrics

Abstract

A novel approach to generation of parallel synchronization-free tiled code for the loop nest is presented. It is derived via a combination of the Polyhedral and Iteration Space Slicing frameworks. It uses the transitive closure of loop nest dependence graphs to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target (corrected) tiles. Then parallel synchronization-free tiled code is generated on the basis of valid (corrected) tiles applying the transitive closure of dependence graphs. The main contribution of the paper is demonstrating that the presented technique is able to generate parallel synchronization-free tiled code, provided that the exact transitive closure of a dependence graph can be calculated and there exist synchronization-free slices on the statement instance level in the loop nest. We show that the presented approach extracts such a parallelism when well-known techniques fail to extract it. Enlarging the scope of loop nests, for which synchronization-free tiled code can be generated, is achieved by means of applying the intersection of extracted slices and generated valid tiles, in contrast to forming slices of valid tiles as suggested in previously published techniques based on the transitive closure of a dependence graph. The presented approach is implemented in the publicly available TC optimizing compiler. Results of experiments demonstrating the effectiveness of the approach and the efficiency of parallel programs generated by means of it are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

If a slice has multiple sources, then although all its sources belong to ${ UDS}$, only the lexicographically minimal source is the representative of a slice.
http://tc-optimizer.sourceforge.net.

References

Bastoul C (2004) Code generation in the polyhedral model is easier than you think. In: PACT’13 IEEE international conference on parallel architecture and compilation techniques, Juan-les-Pins, pp 7–16
Bandishti V, Pananilath I, Bondhugula U (2012) Tiling stencil computations to maximize parallelism. In: SC ’12: Proceedings of the international conference on high performance computing, networking, storage and analysis, IEEE, Salt Lake City, Utah, pp 1–11
Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Comput 37:479–497
Article Google Scholar
Bielecki W et al (2014) Using basis dependence distance vectors to calculate the transitive closure of dependence relations by means of the Floyd-Warshall algorithm. J Comb Optim. doi:10.1007/s10878-014-9740-2
MATH Google Scholar
Bielecki W, Palkowski M (2015) Perfectly nested loop tiling transformations based on the transitive closure of the program dependence graph. In: Wilinski A, Fray IE, Pejas J (eds) Soft computing in computer and information science, advances in intelligent systems and computing. Springer, Berlin, pp 309–320
Google Scholar
Bielecki W, Palkowski M (2016) Tiling arbitrarily nested loops by means of the transitive closure of dependence graphs. Appl Math Comput Sci 26(4):919–939
MathSciNet MATH Google Scholar
Bielecki W, Palkowski M, Klimek T (2012) Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput 38(9):518–532
Article Google Scholar
Bielecki W, Palkowski M, Klimek T (2015) Free scheduling of tiles based on the transitive closure of dependence graphs. In: Wyrzykowski R (ed) 11th international conference on parallel processing and applied mathematics, Part II, Lecture notes in computer science, vol 9574. Springer, Berlin, Heidelberg, pp 133–142
Bondhugula U, Baskaran M, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P (2008) Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Compiler constructure. Lecture notes in computer science. Springer, pp 132–146
Bondhugula U, Hartono A, Ramanujam J, Sadayappan P (2008) A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not 43(6):101–113
Article Google Scholar
Feautrier P (1992) Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int J Parallel Program 21(5):313–348
Article MathSciNet MATH Google Scholar
Feautrier P (1992) Some efficient solutions to the affine scheduling problem: II. multidimensional time. Int J Parallel Program 21(6):389–420
Article MathSciNet MATH Google Scholar
Griebl M (2004) Automatic parallelization of loop programs for distributed memory architectures. University of Passau, Habilitation thesis
Grosser T, Verdoolaege S, Cohen A (2015) Polyhedral ast generation is more than scanning polyhedra. ACM Trans Program Lang Syst 37(4):12:1–12:50
Article Google Scholar
Grosser T, Verdoolaege S, Cohen A, Sadayappan P (2014) The relation between diamond tiling and hexagonal tiling. Parallel Proces Lett 24(03):1441,002
Article MathSciNet MATH Google Scholar
Hartono A, Baskaran MM, Bastoul C, Cohen A, Krishnamoorthy S, Norris B, Ramanujam J, Sadayappan P (2009) Parametric multi-level tiling of imperfectly nested loops. In: Proceedings of the 23rd international conference on Supercomputing, ICS ’09. ACM, New York, NY, USA, pp 147–157. doi:10.1145/1542275.1542301
Irigoin F, Triolet R (1988) Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL ’88. ACM, New York, NY, USA, pp 319–329
Kelly W, Pugh W, Rosser E, Shpeisman T (1996) Transitive closure of infinite graphs and its applications. Int J Parallel Program 24(6):579–598
Article Google Scholar
Kim D, Rajopadhye S (2009) Efficient tiled loop generation: D-tiling. In: International workshop on languages and compilers for parallel computing. Springer, pp 293–307
Krishnamoorthy S, Baskaran MM, Bondhugula U, Ramanujam J, Rountev A, Sadayappan P (2007) Effective automatic parallelization of stencil computations. In: PLDI, pp 235–244
Lim A, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: In Proceedings of the 13th ACM SIGARCH international conference on supercomputing. ACM Press, pp 228–237
Lim AW, Lam MS (1994) Communication-free parallelization via affine transformations. In: 24 th ACM symposium on principles of programming languages. Springer, pp 92–106
NAS benchmarks suite. http://www.nas.nasa.gov (2015)
OpenMP Architecture Review Board (2012) OpenMP application program interface version 4.0
Palkowski M, Klimek T, Bielecki W (2015) Traco: An automatic loop nest parallelizer for numerical applications. In: 2015 Federated conference on computer science and information systems (FedCSIS). IEEE, pp 681–686
Pouchet LN (2015) The polyhedral benchmark suite/c4.1, http://web.cse.ohio-state.edu/~pouchet/software/polybench
Pugh W, Rosser E (1997) Iteration space slicing and its application to communication optimization. In: International conference on supercomputing, pp 221–228
Pugh W, Rosser E (1999) Iteration space slicing for locality. In: LCPC, Lecture notes in computer science, vol 1863. Springer, pp 164–184
Ramanujam J, Sadayappan P (1992) Tiling multidimensional iteration spaces for multicomputers. J Parallel Distrib Comput 16(2):108–120
Article Google Scholar
Verdoolaege S (2010) ISL: an integer set library for the polyhedral model. In: Mathematical software—ICMS 2010, Lecture notes in computer science. vol 6327. Springer, Berlin, pp 299–302
Verdoolaege S (2011) Counting affine calculator and applications. In: First international workshop on polyhedral compilation techniques (IMPACT’11), Charmonix, France
Verdoolaege S (2016) Presburger formulas and polyhedral compilation, v0.02. Polly Labs and KU Leuven
Verdoolaege S, Cohen A, Beletska A (2011) Transitive closures of affine integer tuple relations and their overapproximations. In: Proceedings of the 18th international conference on Static analysis, SAS’11. Springer, pp 216–232
Verdoolaege S, Grosser T (2012) Polyhedral extraction tool. In: In Proceedings of the 2nd international workshop on polyhedral compilation techniques. Paris, France
Wolf ME, Lam MS (1991) A data locality optimizing algorithm. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pp. 30–44
Xue J (2000) Loop tiling for parallelism. Kluwer Academic Publishers, Norwell
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, West Pomeranian University of Technology, ul. Zolnierska 49, 71-210, Szczecin, Poland
Wlodzimierz Bielecki, Marek Palkowski & Piotr Skotnicki

Authors

Wlodzimierz Bielecki
View author publications
You can also search for this author in PubMed Google Scholar
Marek Palkowski
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Skotnicki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piotr Skotnicki.

Appendix A: Tile correction for arbitrarily nested parametric affine loops

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bielecki, W., Palkowski, M. & Skotnicki, P. Generation of parallel synchronization-free tiled code. Computing 100, 277–302 (2018). https://doi.org/10.1007/s00607-017-0576-3

Download citation

Received: 22 August 2016
Accepted: 05 October 2017
Published: 20 October 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00607-017-0576-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generation of parallel synchronization-free tiled code

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Tile Merging Technique to Generate Valid Tiled Code by Means of the Transitive Closure of a Dependence Graph

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

Code Bones: Fast and Flexible Code Generation for Dynamic and Speculative Polyhedral Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Tile correction for arbitrarily nested parametric affine loops

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Generation of parallel synchronization-free tiled code

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Tile Merging Technique to Generate Valid Tiled Code by Means of the Transitive Closure of a Dependence Graph

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

Code Bones: Fast and Flexible Code Generation for Dynamic and Speculative Polyhedral Optimization

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix A: Tile correction for arbitrarily nested parametric affine loops

Appendix A: Tile correction for arbitrarily nested parametric affine loops

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation