skip to main content
10.1145/3564746.3587018acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Scan Stack: A Search-based Concurrent Stack for GPU

Published:12 June 2023Publication History

ABSTRACT

Concurrent data structures play a critical role in the overall performance of GPGPU applications. Stack is one of the basic data structures and finds numerous applications where data is processed in a Last In First Out (LIFO) fashion. Although concurrent stack is well researched for multi-core CPUs, there is little research pointing to the conversion of CPU stacks into a GPU-friendly form. In this paper, we propose a concurrent search-based GPU stack named Scan Stack. The proposed stack is designed to take advantage of GPU memory access patterns, memory coalescence, and thread structures (i.e., warps) to increase throughput. Our experiments on an NVIDIA RTX 3090 show that our proposed scan stack significantly improves the throughput and scalability for all benchmarks when reducing the search area. However, the greatest improvements are shown when elimination is possible, and this improvement reaches nearly 39 times what a non-optimized structure is capable of.

References

  1. Andrey Borisenko, Michael Haidl, and Sergei Gorlatch. 2017. A GPU Parallelization of Branch-and-Bound for Multiproduct Batch Plants Optimization. The Journal of Supercomputing 73, 2 (2017), 639--651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Robert Colvin and Lindsay Groves. 2007. A Scalable Lock-free Stack Algorithm and Its Verification. In Fifth IEEE International Conference on Software Engineering and Formal Methods (SEFM 2007). IEEE, 339--348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2004. A Scalable Lock-free Stack Algorithm. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures. 206--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Danny Hendler, Nir Shavit, and Lena Yerushalmi. 2010. A Scalable Lock-free Stack Algorithm. J. Parallel and Distrib. Comput. 70, 1 (2010), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Abhinav Jangda and Rupesh Nasre. 2016. FastCollect: Offloading Generational Garbage Collection to Integrated GPUs. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES, Vol. 16. 1--10.Google ScholarGoogle Scholar
  6. Henry Massalin and Calton Pu. 1992. A Lock-free Multiprocessor OS Kernel. ACM SIGOPS Operating Systems Review 26, 2 (1992), 108.Google ScholarGoogle ScholarCross RefCross Ref
  7. Maged M Michael. 2003. CAS-based Lock-free Algorithm for Shared Deques. In European Conference on Parallel Processing. Springer, 651--660.Google ScholarGoogle Scholar
  8. Maged M Michael and Michael L Scott. 1996. Simple, Fast, and Practical Nonblocking and Blocking Concurrent Queue Algorithms. In Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. 267--275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Maged M Michael and Michael L Scott. 1998. Nonblocking Algorithms and Preemption-safe Locking on Multiprogrammed Shared Memory Multiprocessors. journal of parallel and distributed computing 51, 1 (1998), 1--26.Google ScholarGoogle Scholar
  10. Prabhakar Misra and Mainak Chaudhuri. 2012. Performance Evaluation of Concurrent Lock-free Data Structures on GPUs. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems. IEEE, 53--60.Google ScholarGoogle Scholar
  11. Heejin Park and Felix Xiaozhu Lin. 2021. Tinystack: A Minimal GPU Stack for Client ML. arXiv preprint arXiv:2105.05085 (2021).Google ScholarGoogle Scholar
  12. Yaqiong Peng and Zhiyu Hao. 2017. FA-Stack: A Fast Array-based Stack with Wait-free Progress Guarantee. IEEE Transactions on Parallel and Distributed Systems 29, 4 (2017), 843--857.Google ScholarGoogle ScholarCross RefCross Ref
  13. Niloufar Shafiei. 2009. Non-blocking Array-based Algorithms for Stacks and Queues. In International Conference on Distributed Computing and Networking. Springer, 55--66.Google ScholarGoogle Scholar
  14. Noah South. 2022. Scan Stack: A Search-based Concurrent Stack for GPU. Master's thesis. The University of Mississippi. https://egrove.olemiss.edu/etd/2459/Google ScholarGoogle Scholar
  15. David Troendle, Tuan Ta, and Byunghyun Jang. 2019. A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs. In Proceedings of the 48th International Conference on Parallel Processing. 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scan Stack: A Search-based Concurrent Stack for GPU

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ACM SE '23: Proceedings of the 2023 ACM Southeast Conference
        April 2023
        216 pages
        ISBN:9781450399210
        DOI:10.1145/3564746

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ACM SE '23 Paper Acceptance Rate31of71submissions,44%Overall Acceptance Rate178of377submissions,47%
      • Article Metrics

        • Downloads (Last 12 months)44
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader