skip to main content
10.1145/2723772.2723780acmotherconferencesArticle/Chapter ViewAbstractPublication PagescosmicConference Proceedingsconference-collections
abstract

Exploiting Dynamic Parallelism to Efficiently Support Irregular Nested Loops on GPUs

Published: 08 February 2015 Publication History

Abstract

Graphics Processing Units (GPUs) have been used in general purpose computing for several years. The newly introduced Dynamic Parallelism feature of Nvidia's Kepler GPUs allows launching kernels from the GPU directly. However, the naïve use of this feature can cause a high number of nested kernel launches, each performing limited work, leading to GPU underutilization and poor performance. We propose workload consolidation mechanisms at different granularities to maximize the work performed by nested kernels and reduce their overhead. Our end goal is to design automatic code transformation techniques for applications with irregular nested loops.

Cited By

View all
  • (2022)A Compiler Framework for Optimizing Dynamic Parallelism on GPUs2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741284(1-13)Online publication date: 2-Apr-2022
  • (2020)Event detection and evolution in multi-lingual social streamsFrontiers of Computer Science10.1007/s11704-019-8201-614:5Online publication date: 16-Mar-2020
  • (2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
COSMIC '15: Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores
February 2015
74 pages
ISBN:9781450333160
DOI:10.1145/2723772
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2015

Check for updates

Author Tags

  1. Dynamic Parallelism
  2. GPU
  3. Irregular Applications

Qualifiers

  • Abstract
  • Research
  • Refereed limited

Funding Sources

Conference

COSMIC '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Compiler Framework for Optimizing Dynamic Parallelism on GPUs2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741284(1-13)Online publication date: 2-Apr-2022
  • (2020)Event detection and evolution in multi-lingual social streamsFrontiers of Computer Science10.1007/s11704-019-8201-614:5Online publication date: 16-Mar-2020
  • (2017)Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applicationsThe Journal of Supercomputing10.1007/s11227-017-2091-x73:12(5378-5401)Online publication date: 1-Dec-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media