poster

Design of a Hybrid MPI-CUDA Benchmark Suite for CPU-GPU Clusters

Authors:
Tejaswi Agarwal

University of Missouri-Columbia, Columbia, MO, USA

University of Missouri-Columbia, Columbia, MO, USA
View Profile

,
Michela Becchi

University of Missouri-Columbia, Columbia, MO, USA

University of Missouri-Columbia, Columbia, MO, USA
View Profile

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilationAugust 2014Pages 505–506https://doi.org/10.1145/2628071.2671423

Published:24 August 2014Publication History

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Pages 505–506

ABSTRACT

In the last few years, GPUs have become an integral part of HPC clusters. To test these heterogeneous CPU-GPU systems, we designed a hybrid CUDA-MPI benchmark suite that consists of three communication- and compute-intensive applications: Matrix Multiplication (MM), Needleman-Wunsch (NW) and the ADFA compression algorithm [1]. The main goal of this work is to characterize these workloads on CPU-GPU clusters. Our benchmark applications are designed to allow cluster administrators to identify bottlenecks in the cluster, to decide if scaling applications to multiple nodes would improve or decrease overall throughput and to design effective scheduling policies. Our experiments show that inter-node communication can significantly degrade the throughput of communication-intensive applications. We conclude that the scalability of the applications depends primarily on two factors: the cluster configuration and the applications characteristics.

References

M. Becchi and P. Crowley, ?A-DFA: A Time- and Space- Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation,? ACM TACO, vol. 10, no. 1, pp. 1--26, 2013. Google ScholarDigital Library
S. B. Needleman, and C. D. Wunsch, ?A general method applicable to the search for similarities in the amino acid sequence of two proteins,? J. of Molecular Biology, vol. 48,no. 3, pp. 443--453, 1970.Google Scholar
How to Optimize Data Transfers in CUDA C/C++, http://devblogs.nvidia.com/parallelforall/how-optimize-datatransfers- cuda-cc.Google Scholar
An Introduction to CUDA-Aware MPI, http://devblogs.nvidia.com/parallelforall/introduction-cudaaware-mpi.Google Scholar

Index Terms

Design of a Hybrid MPI-CUDA Benchmark Suite for CPU-GPU Clusters
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More
Boosting CUDA Applications with CPU---GPU Hybrid Computing

This paper presents a cooperative heterogeneous computing framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The proposed system exploits at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
August 2014
514 pages
ISBN:9781450328098
DOI:10.1145/2628071
General Chair:
J. Nelson Amaral
University of Alberta, Canada
,
Program Chair:
Josep Torrellas
University of Illinois, USA
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Check for updates
Author Tags
benchmark
clusters
cuda-mpi
gpu
Qualifiers
- poster
Conference

Acceptance Rates
PACT '14 Paper Acceptance Rate54of144submissions,38%Overall Acceptance Rate121of471submissions,26%
More
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 152
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Design of a Hybrid MPI-CUDA Benchmark Suite for CPU-GPU Clusters

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

ABSTRACT

References

Cited By

Index Terms

Recommendations

An OpenCL micro-benchmark suite for GPUs and CPUs

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Boosting CUDA Applications with CPU---GPU Hybrid Computing