Skip to main content
Log in

Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We propose a framework based on an original generation and use of algorithmic skeletons, and dedicated to speculative parallelization of scientific nested loop kernels, able to apply at run-time polyhedral transformations to the target code in order to exhibit parallelism and data locality. Parallel code generation is achieved almost at no cost by using binary algorithmic skeletons that are generated at compile-time, and that embed the original code and operations devoted to instantiate a polyhedral parallelizing transformation and to verify the speculations on dependences. The skeletons are patched at run-time to generate the executable code. The run-time process includes a transformation selection guided by online profiling phases on short samples, using an instrumented version of the code. During this phase, the accessed memory addresses are used to compute on-the-fly dependence distance vectors, and are also interpolated to build a predictor of the forthcoming accesses. Interpolating functions and distance vectors are then employed for dependence analysis to select a parallelizing transformation that, if the prediction is correct, does not induce any rollback during execution. In order to ensure that the rollback time overhead stays low, the code is executed in successive slices of the outermost original loop of the nest. Each slice can be either a parallel version which instantiates a skeleton, a sequential original version, or an instrumented version. Moreover, such slicing of the execution provides the opportunity of transforming differently the code to adapt to the observed execution phases, by patching differently one of the pre-built skeletons. The framework has been implemented with extensions of the LLVM compiler and an x86-64 runtime system. Significant speed-ups are shown on a set of benchmarks that could not have been handled efficiently by a compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: a transparent dynamic optimization system. In: PLDI ’00. ACM (2000)

  2. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI ’08. ACM (2008)

  3. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54. IEEE (2009)

  4. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)

    Article  Google Scholar 

  5. GOMP An OpenMP implementation for GCC—GNU Project. http://gcc.gnu.org/projects/gomp

  6. http://www.ice.rwth-aachen.de/research/tools-projects/entry/detail/dspstone/

  7. Jimborean, A., Clauss, P., Pradelle, B., Mastrangelo, L., Loechner, V.: Adapting the polyhedral model as a framework for efficient speculative parallelization. In: PPoPP ’12 (2012)

  8. Jimborean, A., Mastrangelo, L., Loechner, V., Clauss, P.: VMAD: an advanced dynamic program analysis and instrumentation framework. In: OBoyle, M. (ed.) Compiler Construction, Lecture Notes in Computer Science, vol. 7210, pp. 220–239. Springer, Berlin, Heidelberg (2012)

  9. Jimborean, A.: Adapting the polytope model for dynamic and speculative parallelization. PhD Thesis, University of Strasbourg, France (2012). http://tel.archives-ouvertes.fr/tel-00733850

  10. Johnson, T.A., Eigenmann, R., Vijaykumar, T.N.: Speculative thread decomposition through empirical optimization. In: PPoPP ’07. ACM (2007)

  11. Khan, M.A., Charles, H.P., Barthou, D.: Improving performance of optimized kernels through fast instantiations of templates. Concurr. Comput. Pract. Exp. 21(1), 59–70 (2009)

    Google Scholar 

  12. Kim, H., Johnson, N.P., Lee, J.W., Mahlke, S.A., August, D.I.: Automatic speculative doall for clusters. In: CGO ’12. ACM (2012)

  13. Kotzmann, T., Wimmer, C., Mössenböck, H., Rodriguez, T., Russell, K., Cox, D.: Design of the java hotspot client compiler for java 6. ACM Trans. Archit. Code Optim. 5, 7–32 (2008)

    Google Scholar 

  14. Li, C., Gava, F., Hains, G.: Implementation of data-parallel skeletons: a case study using a coarse-grained hierarchical model. In: ISPDC, pp. 26–33 (2012)

  15. Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., Torrellas, J.: POSH: a TLS compiler that exploits program structure. In: PPoPP ’06. ACM (2006)

  16. LLVM compiler infrastructure. http://llvm.org

  17. Noël, F., Hornof, L., Consel, C., Lawall, J.L.: Automatic, template-based run-time specialization: implementation and experimental study. In: International Conference on Computer Languages. IEEE Computer Society Press (1998)

  18. Nugteren, C., Corporaal, H.: Introducing ’Bones’: a parallelizing source-to-source compiler based on algorithmic skeletons. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 1–10. ACM, New York, NY, USA (2012). doi:10.1145/2159430.2159431

  19. Polybenchs. (2010). http://www-rocq.inria.fr/pouchet/software/polybenchs

  20. Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P., Vasilache, N.: Loop transformations: convexity, pruning and optimization. In: POPL ’11. ACM (2011)

  21. Pouchet, L.N.: FM: the Fourier-Motzkin library. (2008). http://www.cse.ohio-state.edu/pouchet/software/fm

  22. Prabhu, M.K., Olukotun, K.: Using thread-level speculation to simplify manual parallelization. In: PPoPP ’03. ACM (2003)

  23. Raman, E., Vachharajani, N., Rangan, R., August, D.I.: Spice: speculative parallel iteration chunk execution. In: CGO ’08. ACM (2008)

  24. Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: PLDI ’95. ACM (1995)

  25. Rosetta Codes. (2011). http://rosettacode.org/wiki/Rosetta_Code

  26. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, NY, USA (1986)

    MATH  Google Scholar 

  27. Smith, F., Grossman, D., Morrisett, G., Hornof, L., Jim, T.: Compiling for template-based run-time code generation. J. Funct. Program. 13(3), 677–708 (2003)

    Google Scholar 

  28. Tian, C., Feng, M., Gupta, R.: Speculative parallelization using state separation and multiple value prediction. In: International Symposium on Memory Management, ISMM ’10. ACM (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Clauss.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jimborean, A., Clauss, P., Dollinger, JF. et al. Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons. Int J Parallel Prog 42, 529–545 (2014). https://doi.org/10.1007/s10766-013-0259-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-013-0259-4

Keywords

Navigation