skip to main content
10.1145/1808954.1808959acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

JCudaMP: OpenMP/Java on CUDA

Authors Info & Claims
Published:01 May 2010Publication History

ABSTRACT

We present an OpenMP framework for Java that can exploit an available graphics card as an application accelerator. Dynamic languages (Java, C#, etc.) pose a challenge here because of their write-once-run-everywhere approach. This renders it impossible to make compile-time assumptions on whether and which type of accelerator or graphics card might be available in the system at run-time.

We present an execution model that dynamically analyzes the running environment to find out what hardware is attached. Based on the results it dynamically rewrites the bytecode and generates the necessary gpGPU code on-the-fly.

Furthermore, we solve two extra problems caused by the combination of Java and CUDA. First, CUDA-capable hardware usually has little memory (compared to main memory). However, as Java is a pointer-free language, array data can be stored in main memory and buffered in GPU memory. Second, CUDA requires one to copy data to and from the graphics card's memory explicitly. As modern languages use many small objects, this would involve many copy operations when done naively. This is exacerbated because Java uses arrays-of-arrays to implement multi-dimensional arrays. A clever copying technique and two new array packages allow for more efficient use of CUDA.

References

  1. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6(2) (2008) 40--53 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Klemm, M., Bezold, M., Veldema, R., Philippsen, M.: JaMP: An Implementation of OpenMP for a Java DSM. Concurrency and Computation: Practice and Experience 18(19) (2007) 2333--2352 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Scarpino, M.: Programming the Cell Processor: For Games, Graphics, and Computation. Prentice Hall PTR, Upper Saddle River, NJ (2008) Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH '04, Los Angeles, CA (2004) 777--786 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Dubey, P., Junkins, S., Lake, A., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Abrash, M., Sugerman, J., Hanrahan, P.: Larrabee: A Many-Core x86 Architecture for Visual Computing. IEEE Micro 29(1) (2009) 10--21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lee, S., Min, S. J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Symp. on Principles and Practice of Parallel Programming, Raleigh, NC (2008) 101--110 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lin, Y., Terboven, C., an Mey, D., Copty, N.: Automatic scoping of variables in parallel regions of an openmp program. In Chapman, B. M., ed.: WOMPAT. Volume 3349 of Lecture Notes in Computer Science., Springer (2004) 83--97 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Midkiff, S., Moreira, J., Snir, M.: Java For Numerically Intensive Computing: From Flops To Gigaflops. In: Symp. on the Frontiers of Massively Parallel Computation, Annapolis, MA (1999) 251--261 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Black, F., Scholes, M.: The pricing of options and corporate liabilities. Journal of Political Economy 81(3) (1973) 637--54Google ScholarGoogle ScholarCross RefCross Ref
  10. Wolf-Gladrow, D.: Lattice-Gas Cellular Automata and Lattice Boltzmann Models. Number 1725 in Lecture Notes in Mathematics. Springer (2000)Google ScholarGoogle Scholar
  11. Matsumoto, M., Nishimura, T.: Mersenne Twister: a 623-dimensionally Equidistributed Uniform Pseudo-random Number Generator. ACM Trans. Model. Comput. Simul. 8(1) (1998) 3--30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. JCuda. http://www.jcuda.org/Google ScholarGoogle Scholar
  13. Barrachina, S., Castillo, M., Igual, F., Mayo, R., Quintana-Orti, E.: Evaluation and tuning of the Level 3 CUBLAS for graphics processors. In: Intl. Parallel and Distributed Processing Symp., Miami, FL (2008) 1--8Google ScholarGoogle ScholarCross RefCross Ref
  14. Stratton., J., Stone., S., Hwu, W. M. W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs, Edmonton, Canada (2008) 16--30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. cCool, M., Toit, S. D.: Metaprogramming GPUs with Sh. AK Peters Ltd (2004) Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Breitbart, J.: CuPP -- A framework for easy CUDA integration. In: HIPS: High-Level Parallel Programming Models and Supportive Environments, Rome, Italy (2009) 1--8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ueng, S. Z., Lathara, M., Baghsorkhi, S., Hwu, W. M. W.: CUDA-Lite: Reducing GPU Programming Complexity. In: Languages and Compilers for Parallel Computing, Edmonton, Canada (2008) 1--15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Khronos. http://www.khronos.org/opencl/Google ScholarGoogle Scholar
  19. Wolfe, M.: More iteration space tiling. In: Proc. of the 1989 ACM/IEEE conference on Supercomputing, Reno, Nevada (1989) 655--664 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guo, J., Bikshandi, G., Fraguela, B. B., Garzaran, M. J., Padua, D.: Programming with Tiles. In: PPoPP '08: Proc. of the 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, Salt Lake City, UT (2008) 111--122 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. JCudaMP: OpenMP/Java on CUDA

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IWMSE '10: Proceedings of the 3rd International Workshop on Multicore Software Engineering
          May 2010
          72 pages
          ISBN:9781605589640
          DOI:10.1145/1808954

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 May 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader