Skip to main content

Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7146))

  • 922 Accesses

Abstract

GPU-to-CPU translation may extend Graphics Processing Units (GPU) programs executions to multi-/many-core CPUs, and hence enable cross-device task migration and promote whole-system synergy. This paper describes some of our findings in treatment to GPU synchronizations during the translation process. We show that careful dependence analysis may allow a fine-grained treatment to synchronizations and reveal redundant computation at the instruction-instance level. Based on thread-level dependence graphs, we present a method to enable such fine-grained treatment automatically. Experiments demonstrate that compared to existing translations, the new approach can yield speedup of a factor of integers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hpcgpu project, http://hpcgpu.codeplex.com/

  2. NVIDIA CUDA Programming Guide, http://developer.download.nvidia.com

  3. OpenCL, http://www.khronos.org/opencl/

  4. Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A compiler framework for optimization of affine loop nests for GPGPUs. In: ICS 2008: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 225–234 (2008)

    Google Scholar 

  6. Carrillo, S., Siegel, J., Li, X.: A control-structure splitting optimization for GPGPU. In: Proceedings of ACM Computing Frontiers (2009)

    Google Scholar 

  7. Cooper, K., Torczon, L.: Engineering a Compiler. Morgan Kaufmann (2003)

    Google Scholar 

  8. Diamos, G., Kerr, A., Yalamanchili, S., Clark, N.: Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems. In: Proceedings of the Nineteenth International Conference on Parallel Architectures and Compilation Techniques. ACM (2010)

    Google Scholar 

  9. Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Stratton, J.A., et al.: Efficient compilation of fine-grained SPMD-threadedprograms for multicore CPUs. In: CGO 2010 (2010)

    Google Scholar 

  11. Fung, W., Sham, I., Yuan, G., Aamodt, T.: Dynamic warp formation and scheduling for efficient GPU control flow. In: MICRO 2007: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 407–420. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  12. Guo, Z., Zhang, E., Shen, X.: Correctly treating synchronizations in compiling fine-grained SPMD-threaded programs for CPU. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques (2011)

    Google Scholar 

  13. Hormati, A., Samadi, M., Woh, M., Mudge, T., Mahlke, S.: Sponge: Portable stream programming on graphics engines. In: ASPLOS 2011 (2011)

    Google Scholar 

  14. Lee, S., Min, S.-J., Eigenmann, R.: Openmp to GPGPU: a compiler framework for automatic translation and optimization. In: PPOPP 2009, pp. 101–110 (2009)

    Google Scholar 

  15. Meng, J., Tarjan, D., Skadron, K.: Dynamic warp subdivision for integrated branch and memory divergence tolerance. In: ISCA 2010 (2010)

    Google Scholar 

  16. Michel, S., Philipp, K., Sergei, G.: Skelcl - a portable skeleton library for high-level GPU programming. In: IPDPS 2011 (2011)

    Google Scholar 

  17. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 73–82 (2008)

    Google Scholar 

  18. Tarjan, D., Meng, J., Skadron, K.: Increasing memory latency tolerance for SIMD cores. In: SC 2009 (2009)

    Google Scholar 

  19. Yang, Y., Xiang, P., Kong, J., Zhou, H.: A GPGPU compiler for memory optimization and parallelism management. In: PLDI (2010)

    Google Scholar 

  20. Zhang, E.Z., Jiang, Y., Guo, Z., Shen, X.: Streamlining GPU applications on the fly. In: Proceedings of the ACM International Conference on Supercomputing, ICS, pp. 115–125 (2010)

    Google Scholar 

  21. Zhang, E.Z., Jiang, Y., Guo, Z., Tian, K., Shen, X.: On-the-fly elimination of dynamic irregularities for GPU computing. In: ASPLOS 2011 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, Z., Shen, X. (2013). Fine-Grained Treatment to Synchronizations in GPU-to-CPU Translation. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36036-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36035-0

  • Online ISBN: 978-3-642-36036-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics