skip to main content
10.1145/2145816.2145863acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

OpenCL as a unified programming model for heterogeneous CPU/GPU clusters

Published: 25 February 2012 Publication History

Abstract

In this paper, we propose an OpenCL framework for heterogeneous CPU/GPU clusters, and show that the framework achieves both high performance and ease of programming. The framework provides an illusion of a single system for the user. It allows the application to utilize multiple heterogeneous compute devices, such as multicore CPUs and GPUs, in a remote node as if they were in a local node. No communication API, such as the MPI library, is required in the application source. We implement the OpenCL framework and evaluate its performance on a heterogeneous CPU/GPU cluster that consists of one host node and nine compute nodes using eleven OpenCL benchmark applications.

References

[1]
AMD Accelerated Parallel Processing (APP) SDK With OpenCL 1.1 Support. AMD, 2011. http://developer.amd.com/sdks/AMDAPPSDK/Pages/default.aspx.
[2]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pages 72--81, 2008.
[3]
The OpenCL Specification Version 1.1. Khronos OpenCL Working Group, 2010. http://www.khronos.org/opencl.
[4]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '04, pages 75--86, 2004.
[5]
J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi. An OpenCL framework for heterogeneous multicores with local memory. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 193--204, 2010.
[6]
NVIDIA. NVIDIA CUDA Toolkit 4.0. http://developer.nvidia.com/cuda-toolkit-40.
[7]
S. Seo, G. Jo, and J. Lee. Performance Characterization of the NAS Parallel Benchmarks in OpenCL. In Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC '11, 2011.
[8]
The IMPACT Research Group. Parboil Benchmark suite. http://impact.crhc.illinois.edu/parboil.php.

Cited By

View all
  • (2018)Unified Cross-Platform Profiling of Parallel C++ Applications2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS.2018.8641652(57-62)Online publication date: Nov-2018
  • (2017)A Feasible Architecture for ARM-Based Microserver Systems Considering Energy EfficiencyIEEE Access10.1109/ACCESS.2017.26576585(4611-4620)Online publication date: 2017
  • (2016)Automatic program generation for heterogeneous architectures2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2016.7732032(102-109)Online publication date: Sep-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 8
    PPOPP '12
    August 2012
    334 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2370036
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenCL
  2. clusters
  3. heterogeneous computing
  4. programming models

Qualifiers

  • Poster

Conference

PPoPP '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Unified Cross-Platform Profiling of Parallel C++ Applications2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)10.1109/PMBS.2018.8641652(57-62)Online publication date: Nov-2018
  • (2017)A Feasible Architecture for ARM-Based Microserver Systems Considering Energy EfficiencyIEEE Access10.1109/ACCESS.2017.26576585(4611-4620)Online publication date: 2017
  • (2016)Automatic program generation for heterogeneous architectures2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2016.7732032(102-109)Online publication date: Sep-2016
  • (2015)Enabling a Uniform OpenCL Device View for Heterogeneous PlatformsIEICE Transactions on Information and Systems10.1587/transinf.2014EDP7244E98.D:4(812-823)Online publication date: 2015
  • (2015)Silent Data Corruption (SDC) vulnerability of GPU on various GPGPU workloads2015 International SoC Design Conference (ISOCC)10.1109/ISOCC.2015.7401681(11-12)Online publication date: Nov-2015
  • (2015)Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop10.1109/IPDPSW.2015.21(594-603)Online publication date: 25-May-2015
  • (2014)Graphics Processing Units and Open Computing Language for parallel computingComputers and Electrical Engineering10.1016/j.compeleceng.2013.11.01540:1(241-251)Online publication date: 1-Jan-2014
  • (2013)Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPUACM SIGPLAN Notices10.1145/2517327.244252348:8(57-68)Online publication date: 23-Feb-2013
  • (2013)An automatic input-sensitive approach for heterogeneous task partitioningProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465007(149-160)Online publication date: 10-Jun-2013
  • (2013)Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPUProceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/2442516.2442523(57-68)Online publication date: 23-Feb-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media