research-article

Scan Stack: A Search-based Concurrent Stack for GPU

Authors:
Noah South

University of Mississippi, Oxford, MS, USA

University of Mississippi, Oxford, MS, USA

https://orcid.org/0009-0002-7720-3300
View Profile

,
Byunghyun Jang

University of Mississippi, Oxford, MS, USA

University of Mississippi, Oxford, MS, USA

https://orcid.org/0000-0001-8316-4323
View Profile

ACM SE '23: Proceedings of the 2023 ACM Southeast ConferenceApril 2023Pages 10–19https://doi.org/10.1145/3564746.3587018

Published:12 June 2023Publication History

ACM SE '23: Proceedings of the 2023 ACM Southeast Conference

Pages 10–19

ABSTRACT

Concurrent data structures play a critical role in the overall performance of GPGPU applications. Stack is one of the basic data structures and finds numerous applications where data is processed in a Last In First Out (LIFO) fashion. Although concurrent stack is well researched for multi-core CPUs, there is little research pointing to the conversion of CPU stacks into a GPU-friendly form. In this paper, we propose a concurrent search-based GPU stack named Scan Stack. The proposed stack is designed to take advantage of GPU memory access patterns, memory coalescence, and thread structures (i.e., warps) to increase throughput. Our experiments on an NVIDIA RTX 3090 show that our proposed scan stack significantly improves the throughput and scalability for all benchmarks when reducing the search area. However, the greatest improvements are shown when elimination is possible, and this improvement reaches nearly 39 times what a non-optimized structure is capable of.

References

Andrey Borisenko, Michael Haidl, and Sergei Gorlatch. 2017. A GPU Parallelization of Branch-and-Bound for Multiproduct Batch Plants Optimization. The Journal of Supercomputing 73, 2 (2017), 639--651.Google ScholarDigital Library
Robert Colvin and Lindsay Groves. 2007. A Scalable Lock-free Stack Algorithm and Its Verification. In Fifth IEEE International Conference on Software Engineering and Formal Methods (SEFM 2007). IEEE, 339--348.Google ScholarDigital Library
Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2004. A Scalable Lock-free Stack Algorithm. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures. 206--215.Google ScholarDigital Library
Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2010. A Scalable Lock-free Stack Algorithm. J. Parallel and Distrib. Comput. 70, 1 (2010), 1--12.Google ScholarDigital Library
Abhinav Jangda and Rupesh Nasre. 2016. FastCollect: Offloading Generational Garbage Collection to Integrated GPUs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES, Vol. 16. 1--10.Google Scholar
Henry Massalin and Calton Pu. 1992. A Lock-free Multiprocessor OS Kernel. ACM SIGOPS Operating Systems Review 26, 2 (1992), 108.Google ScholarCross Ref
Maged M Michael. 2003. CAS-based Lock-free Algorithm for Shared Deques. In European Conference on Parallel Processing. Springer, 651--660.Google Scholar
Maged M Michael and Michael L Scott. 1996. Simple, Fast, and Practical Nonblocking and Blocking Concurrent Queue Algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. 267--275.Google ScholarDigital Library
Maged M Michael and Michael L Scott. 1998. Nonblocking Algorithms and Preemption-safe Locking on Multiprogrammed Shared Memory Multiprocessors. journal of parallel and distributed computing 51, 1 (1998), 1--26.Google Scholar
Prabhakar Misra and Mainak Chaudhuri. 2012. Performance Evaluation of Concurrent Lock-free Data Structures on GPUs. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems. IEEE, 53--60.Google Scholar
Heejin Park and Felix Xiaozhu Lin. 2021. Tinystack: A Minimal GPU Stack for Client ML. arXiv preprint arXiv:2105.05085 (2021).Google Scholar
Yaqiong Peng and Zhiyu Hao. 2017. FA-Stack: A Fast Array-based Stack with Wait-free Progress Guarantee. IEEE Transactions on Parallel and Distributed Systems 29, 4 (2017), 843--857.Google ScholarCross Ref
Niloufar Shafiei. 2009. Non-blocking Array-based Algorithms for Stacks and Queues. In International Conference on Distributed Computing and Networking. Springer, 55--66.Google Scholar
Noah South. 2022. Scan Stack: A Search-based Concurrent Stack for GPU. Master's thesis. The University of Mississippi. https://egrove.olemiss.edu/etd/2459/Google Scholar
David Troendle, Tuan Ta, and Byunghyun Jang. 2019. A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs. In Proceedings of the 48th International Conference on Parallel Processing. 1--11.Google ScholarDigital Library

Index Terms

Scan Stack: A Search-based Concurrent Stack for GPU
1. Computing methodologies
  1. Concurrent computing methodologies
2. Information systems
  1. Data management systems
    1. Data structures

Recommendations

Stack-based parallel recursion on graphics processors
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree ...
Read More
Faster GPU-based genetic programming using a two-dimensional stack

Genetic programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards to Graphics Processing Units (GPU). Hence, versions ...
Read More
Lock-based synchronization for GPU architectures
CF '16: Proceedings of the ACM International Conference on Computing Frontiers

Modern GPUs have shown promising results in accelerating compute-intensive and numerical workloads with limited data sharing. However, emerging GPU applications manifest ample amount of data sharing among concurrently executing threads. Often data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM SE '23: Proceedings of the 2023 ACM Southeast Conference
April 2023
216 pages
ISBN:9781450399210
DOI:10.1145/3564746
Chair:
Kuang-Nan Chang,
Organizing Committee Chair:
Eric Gamess,
Program Chair:
Chi Shen
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
concurrency
non-blocking
data structure
stack
Qualifiers
- research-article
Conference

Acceptance Rates
ACM SE '23 Paper Acceptance Rate31of71submissions,44%Overall Acceptance Rate178of377submissions,47%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 44
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scan Stack: A Search-based Concurrent Stack for GPU

ACM SE '23: Proceedings of the 2023 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stack-based parallel recursion on graphics processors

Faster GPU-based genetic programming using a two-dimensional stack

Lock-based synchronization for GPU architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scan Stack: A Search-based Concurrent Stack for GPU

ACM SE '23: Proceedings of the 2023 ACM Southeast Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stack-based parallel recursion on graphics processors

Faster GPU-based genetic programming using a two-dimensional stack

Lock-based synchronization for GPU architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media