Skip to main content

CnC-CUDA: Declarative Programming for GPUs

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

Abstract

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10’s to 100’s of cores per chip, and complex hardware designs to address the challenges of concurrency, energy efficiency and resiliency. Unlike previous generations of hardware evolution, this shift towards many-core computing will have a profound impact on software. These software challenges are further compounded by the need to enable parallelism in workloads and application domains that traditionally did not have to worry about multiprocessor parallelism in the past. A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. Unfortunately, hybrid programming models that support multithreaded execution on CPUs in parallel with CUDA execution on GPUs prove to be too complex for use by mainstream programmers and domain experts, especially when targeting platforms with multiple CPU cores and multiple GPU devices.

In this paper, we extend past work on Intel’s Concurrent Collections (CnC) programming model to address the hybrid programming challenge using a model called CnC-CUDA. CnC is a declarative and implicitly parallel coordination language that supports flexible combinations of task and data parallelism while retaining determinism. CnC computations are built using steps that are related by data and control dependence edges, which are represented by a CnC graph. The CnC-CUDA extensions in this paper include the definition of multithreaded steps for execution on GPUs, and automatic generation of data and control flow between CPU steps and GPU steps. Experimental results show that this approach can yield significant performance benefits with both GPU execution and hybrid CPU/GPU execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Habanero Multicore Software Project, http://habanero.rice.edu

  2. Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., Taşrlar, S.: The CnC Programming Model. In: SIAM PP10, Special Issue on Scientific Programming (2010)

    Google Scholar 

  3. Burke, M.G., Knobe, K., Newton, R., Sarkar, V.: The Concurrent Collections Programming Model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing. Springer, New York (to be published 2011)

    Google Scholar 

  4. Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Menon, R.: Programming in OpenMP. Academic Press, London (2001)

    Google Scholar 

  5. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (October 2009)

    Google Scholar 

  6. Concurrent Collections in Habanero-Java, HJ (2010), http://habanero.rice.edu/cnc-download

  7. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Charles, P., et al.: X10: An object-oriented approach to non-uniform cluster computing. In: Proceedings of OOPSLA 2005, ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 519–538 (2005)

    Google Scholar 

  9. Barik, R., et al.: Experiences with an smp implementation for x10 based on the java concurrency utilities. In: Workshop on Programming Models for Ubiquitous Parallelism (PMUP), held in conjunction with PACT 2006 (September 2006)

    Google Scholar 

  10. Bocchino, R.L., et al.: A type and effect system for Deterministic Parallel Java. In: Proceedings of OOPSLA 2009, ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 97–116 (2009)

    Google Scholar 

  11. Lo, V.M., et al.: Oregami: Tools for mapping parallel computations to parallel architectures. IJPP: International Journal of Parallel Programming 20(3), 237–270 (1991)

    Google Scholar 

  12. Lee, V.W., et al.: Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu. In: ISCA 2010: ACM IEEE International Symposium on Computer Architecture (June 2010)

    Google Scholar 

  13. Budimlić, Z., et al.: Declarative aspects of memory management in the concurrent collections parallel programming model. In: DAMP 2009: the Workshop on Declarative Aspects of Multicore Programming, pp. 47–58. ACM, New York (2008)

    Google Scholar 

  14. Gelernter, D.: Generative communication in linda. ACM Trans. Program. Lang. Syst. 7(1), 80–112 (1985)

    Article  MATH  Google Scholar 

  15. The Java Grande Forum benchmark suite, http://www.epcc.ed.ac.uk/javagrande

  16. Kennedy, K., Koelbel, C., Zima, H.P.: The rise and fall of High Performance Fortran. In: Proceedings of HOPL 2007, Third ACM SIGPLAN History of Programming Languages Conference, pp. 1–22 (2007)

    Google Scholar 

  17. Khronos OpenCL Working Group. The OpenCL Specification - Version 1.0. Technical report, The Khronos Group (2009)

    Google Scholar 

  18. Knobe, K., Offner, C.D.: Tstreams: A model of parallel computation (preliminary report). Technical Report HPL-2004-78, HP Labs (2004)

    Google Scholar 

  19. Lee, S., Min, S.-J., Eigenmann, R.: Openmp to gpgpu: a compiler framework for automatic translation and optimization. In: PPoPP 2009, pp. 101–110. ACM, New York (2009)

    Google Scholar 

  20. Nickolls, J., Buck, I., Garland, M., Nvidia, Skadron, K.: Scalable Parallel Programming with CUDA. ACM Queue 6(2), 40–53 (2008)

    Article  Google Scholar 

  21. Peierls, T., Goetz, B., Bloch, J., Bowbeer, J., Lea, D., Holmes, D.: Java Concurrency in Practice. Addison-Wesley Professional, Reading (2005)

    Google Scholar 

  22. Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, Sebastopol (2007)

    Google Scholar 

  23. Yan, Y., Grossman, M., Sarkar, V.: Jcuda: A programmer-friendly interface for accelerating java programs with cuda. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grossman, M., Simion Sbîrlea, A., Budimlić, Z., Sarkar, V. (2011). CnC-CUDA: Declarative Programming for GPUs. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19595-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19594-5

  • Online ISBN: 978-3-642-19595-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics