skip to main content
10.1145/2159430.2159433acmconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

FLAT: a GPU programming framework to provide embedded MPI

Authors Info & Claims
Published:03 March 2012Publication History

ABSTRACT

For leveraging multiple GPUs in a cluster system, it is necessary to assign application tasks to multiple GPUs and execute those tasks with appropriately using communication primitives to handle data transfer among GPUs. In current GPU programming models, communication primitives such as MPI functions cannot be used within GPU kernels. Instead, such functions should be used in the CPU code. Therefore, programmer must handle both GPU kernel and CPU code for data communications. This makes GPU programming and its optimization very difficult.

In this paper, we propose a programming framework named FLAT which enables programmers to use MPI functions within GPU kernels. Our framework automatically transforms MPI functions written in a GPU kernel into runtime routines executed on the CPU. The execution model and the implementation of FLAT are described, and the applicability of FLAT in terms of scalability and programmability is discussed. We also evaluate the performance of FLAT. The result shows that FLAT achieves good scalability for intended applications.

References

  1. NVIDIA GPUDirect#8482;. http://developer.nvidia.com/gpudirect.Google ScholarGoogle Scholar
  2. R. Babich, M. A. Clark, and B. Joó. Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Q.-k. Chen and J.-k. Zhang. A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA. In Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering, ICISE '09, pages 86--89, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The Scalable Heterogeneous Computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pages 63--74, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. A. Jacobsen, J. C. Thibault, and I. Senocak. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters, 1 2010.Google ScholarGoogle Scholar
  6. J. Kim, H. Kim, J. H. Lee, and J. Lee. Achieving a single compute device image in opencl for multiple gpus. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, PPoPP '11, pages 277--288, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Komatitsch, G. Erlebacher, D. Göddeke, and D. Michéa. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. J. Comput. Phys., 229:7692--7714, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. S. Lawlor. Message passing for GPGPU clusters: CudaMPI. In CLUSTER, pages 1--8. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Lee, S.-J. Min, and R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Not., 44:101--110, February 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Leung, N. Vasilache, B. Meister, M. Baskaran, D. Wohlford, C. Bastoul, and R. Lethin. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pages 51--61, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Noaje, M. Krajecki, and C. Jaillet. MultiGPU computing using MPI or OpenMP. In Proceedings of the Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing, ICCP '10, pages 347--354, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Stuart and J. Owens. Multi-gpu mapreduce on gpu clusters. In Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International, pages 1068--1079, may 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. TOP500 Super Computing Sites. TOP500 List - June 2011(1--100). http://www.top500.org/list/2011/06/100.Google ScholarGoogle Scholar
  14. K. H. Tsoi, A. H. Tse, P. Pietzuch, and W. Luk. Programming framework for clusters with heterogeneous accelerators. SIGARCH Comput. Archit. News, 38:53--59, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FLAT: a GPU programming framework to provide embedded MPI

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GPGPU-5: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
        March 2012
        122 pages
        ISBN:9781450312332
        DOI:10.1145/2159430

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 March 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate57of129submissions,44%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader