poster

A scalable queue for work distribution on GPUs

Authors:

Markus SteinbergerAuthors Info & Claims

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 401 - 402

https://doi.org/10.1145/3178487.3178526

Published: 10 February 2018 Publication History

Get Access

Abstract

Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent queues to collect and distribute work. We present a new concurrent work queue, the Broker Queue, a highly efficient, linearizable queue for fine-granular work distribution on the GPU. We evaluate its usability and benefits in contrast to existing queuing algorithms. Our queue is up to one order of magnitude faster than non-blocking queues, and outperforms simpler queue designs that are unfit for fine-granular work distribution.

References

[1]

Allan Gottlieb, Boris D. Lubachevsky, and Larry Rudolph. 1983. Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors. ACM Trans. Program. Lang. Syst. 5, 2 (April 1983), 164--189.

Digital Library

Google Scholar

[2]

Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures (SPAA '10). ACM, New York, NY, USA, 355--364.

Digital Library

Google Scholar

[3]

Adam Morrison and Yehuda Afek. 2013. Fast Concurrent Queues for x86 Processors. SIGPLAN Not. 48, 8 (Feb. 2013), 103--112.

Digital Library

Google Scholar

[4]

Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU. ACM Trans. Graph. 33, 6, Article 228 (Nov. 2014), 11 pages.

Digital Library

Google Scholar

Index Terms

A scalable queue for work distribution on GPUs
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU
ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent ...
A scalable queue for work distribution on GPUs
PPoPP '18

Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent ...
Performance Evaluation of Concurrent Lock-free Data Structures on GPUs
ICPADS '12: Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems

Graphics processing units (GPUs) have emerged as a strong candidate for high-performance computing. While regular data-parallel computations with little or no synchronization are easy to map on the GPU architectures, it is a challenge to scale up ...

Comments

Information & Contributors

Information

Published In

PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 2018

442 pages

ISBN:9781450349826

DOI:10.1145/3178487

General Chair:
Andreas Krall
Vienna University of Technology, Austria
,
Program Chair:
Thomas R. Gross
ETH Zürich, Switzerland

ACM SIGPLAN Notices Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PPoPP '18

Sponsor:

PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 24 - 28, 2018

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
268
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

The Broker Queue: A Fast, Linearizable FIFO Queue for Fine-Granular Work Distribution on the GPU