Skip to main content

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8299))

Included in the following conference series:

Abstract

This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and ‘algorithmic species’, an algorithm classification of program code. We use a tool to automatically annotate C code with species information where possible. The annotated program code is subsequently fed into the skeleton-based source-to-source compiler ‘Bones’, which generates OpenMP, OpenCL or CUDA code and optimises host-accelerator transfers. This results in a unique approach, integrating a skeleton-based compiler for the first time into an automated flow. We demonstrate the benefits of our approach on the PolyBench suite by showing average speed-ups of 1.4x and 1.6x for GPU code compared to ppcg and Par4All, two state-of-the-art compilers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., Mcmahon, J.O., Pasquier, F.-X., Péan, G., Villalon, P.: Par4All: From Convex Array Regions to Heterogeneous Computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques (2012)

    Google Scholar 

  2. Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Caarls, W., Jonker, P., Corporaal, H.: Algorithmic Skeletons for Stream Programming in Embedded Heterogeneous Parallel Image Processing Applications. In: IPDPS: Int. Parallel and Distributed Processing Symposium. IEEE (2006)

    Google Scholar 

  4. Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press (1991)

    Google Scholar 

  5. Custers, P.: Algorithmic Species: Classifying Program Code for Parallel Computing. Master’s thesis, Eindhoven University of Technology (2012)

    Google Scholar 

  6. Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A Hybrid Multi-core Parallel Programming Environment. In: GPGPU-1: 1st Workshop on General Purpose Processing on Graphics Processing Units (2007)

    Google Scholar 

  7. Enmyren, J., Kessler, C.W.: SkePU: A Multi-backend Skeleton Programming Library for Multi-GPU Systems. In: HLPP 2010: 4th International Workshop on High-level Parallel Programming and Applications. ACM (2010)

    Google Scholar 

  8. Feautrier, P.: Dataflow Analysis of Array and Scalar References. Springer International Journal of Parallel Programming 20, 23–53 (1991)

    Article  MATH  Google Scholar 

  9. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a High-Level Language Targeted to GPU Codes. In: Workshop on Innovative Parallel Computing (2012)

    Google Scholar 

  10. Guelton, S., Amini, M., Creusillet, B.: Beyond Do Loops: Data Transfer Generation with Convex Array Regions. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 249–263. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Han, T., Abdelrahman, T.: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems 22, 78–90 (2011)

    Article  Google Scholar 

  12. Jablin, T., Jablin, J., Prabhu, P., Liu, F., August, D.: Dynamically Managed Data for CPU-GPU Architectures. In: CGO 2012: International Symposium on Code Generation and Optimization. ACM (2012)

    Google Scholar 

  13. Khan, M., Basu, P., Rudy, G., Hall, M., Chen, C., Chame, J.: A Script-Based Autotuning Compiler System to Generate High-Performance CUDA Code. ACM Transactions on Architecture and Code Optimisations 9(4), Article 31 (January 2013)

    Google Scholar 

  14. Lee, Y., Krashinsky, R., Grover, V., Keckler, S.W., Asanovic, K.: Convergence and Scalarization for Data-Parallel Architectures. In: CGO 2013: International Symposium on Code Generation and Optimization. IEEE (2013)

    Google Scholar 

  15. Nugteren, C., Corporaal, H.: Introducing ‘Bones’: A Parallelizing Source-to-Source Compiler Based on Algorithmic Skeletons. In: GPGPU-5: 5th Workshop on General Purpose Processing on Graphics Processing Units. ACM (2012)

    Google Scholar 

  16. Nugteren, C., Corvino, R., Corporaal, H.: Algorithmic Species Revisited: A Program Code Classification Based on Array References. In: MuCoCoS 2013: International Workshop on Multi-/Many-core Computing Systems (2013)

    Google Scholar 

  17. Nugteren, C., Custers, P., Corporaal, H.: Algorithmic Species: An Algorithm Classification of Affine Loop Nests for Parallel Programming. ACM TACO: Transactions on Architecture and Code Optimisations 9(4), Article 40 (2013)

    Google Scholar 

  18. Olschanowsky, C., Snavely, A., Meswani, M., Carrington, L.: PIR: PMaC’s Idiom Recognizer. In: ICPPW 2010: 39th International Conference on Parallel Processing Workshops. IEEE (2010)

    Google Scholar 

  19. Park, E., Pouchet, L.-N., Cavazos, J., Cohen, A., Sadayappan, P.: Predictive Modeling in a Polyhedral Optimization Space. In: CGO 2011: International Symposium on Code Generation and Optimization. IEEE (2011)

    Google Scholar 

  20. Shen, J., Fang, J., Sips, H., Varbanescu, A.: Performance Gaps between OpenMP and OpenCL for Multi-core CPUs. In: ICPPW: International Conference on Parallel Processing Workshops. IEEE (2012)

    Google Scholar 

  21. Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In: IPDPSW 2011: International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. IEEE (2011)

    Google Scholar 

  22. Verdoolaege, S., Carlos Juega, J., Cohen, A., Ignacio Gómez, J., Tenllado, C., Catthoor, F.: Polyhedral Parallel Code Generation for CUDA. ACM Transactions on Architecture and Code Optimisations 9(4), Article 54 (January 2013)

    Google Scholar 

  23. Wolfe, M.: Implementing the PGI Accelerator Model. In: GPGPU-3: 3rd Workshop on General Purpose Processing on Graphics Processing Units. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nugteren, C., Custers, P., Corporaal, H. (2013). Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification. In: Wu, C., Cohen, A. (eds) Advanced Parallel Processing Technologies. APPT 2013. Lecture Notes in Computer Science, vol 8299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45293-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45293-2_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45292-5

  • Online ISBN: 978-3-642-45293-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics