abstract

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs

Authors:

Da Li,

Hancheng Wu,

Michela BecchiAuthors Info & Claims

COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

Article No.: 5, Page 1

https://doi.org/10.1145/2723772.2723780

Published: 08 February 2015 Publication History

Get Access

Abstract

Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.

Cited By

View all

Olabi MLuna JMutlu OHwu WHajj I(2022)A Compiler Framework for Optimizing Dynamic Parallelism on GPUs2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741284(1-13)Online publication date: 2-Apr-2022
https://doi.org/10.1109/CGO53902.2022.9741284
Liu YPeng HLi JSong YLi X(2020)Event detection and evolution in multi-lingual social streamsFrontiers of Computer Science10.1007/s11704-019-8201-614:5Online publication date: 16-Mar-2020
https://doi.org/10.1007/s11704-019-8201-6
Jarząbek źCzarnul P(2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11227-017-2091-x

Index Terms

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Taming irregular applications via advanced dynamic parallelism on GPUs
CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers

On recent GPU architectures, dynamic parallelism, which enables the launching of kernels from the GPU without CPU involvement, provides a way to improve the performance of irregular applications by generating child kernels dynamically to reduce workload ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication
E2SC '14: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only ...

Comments

Information & Contributors

Information

Published In

COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores

February 2015

74 pages

ISBN:9781450333160

DOI:10.1145/2723772

Program Chairs:
Zheng Wang
Lancaster University
,
Pavlos Petoumenos
The University of Edinburgh
,
Hugh Leather
The University of Edinburgh

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2015

Check for updates

Author Tags

Qualifiers

Abstract
Research
Refereed limited

Funding Sources

NEC Labs America
NSF
Nvidia

Conference

COSMIC '15

COSMIC '15: International Workshop on Code Optimisation for Multi and Many Cores

February 8, 2015

CA, San Francisco Bay Area, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Olabi MLuna JMutlu OHwu WHajj I(2022)A Compiler Framework for Optimizing Dynamic Parallelism on GPUs2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741284(1-13)Online publication date: 2-Apr-2022
https://doi.org/10.1109/CGO53902.2022.9741284
Liu YPeng HLi JSong YLi X(2020)Event detection and evolution in multi-lingual social streamsFrontiers of Computer Science10.1007/s11704-019-8201-614:5Online publication date: 16-Mar-2020
https://doi.org/10.1007/s11704-019-8201-6
Jarząbek źCzarnul P(2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11227-017-2091-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

Cited By

Index Terms

Recommendations

Taming irregular applications via advanced dynamic parallelism on GPUs

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations