ABSTRACT
Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree traversal, GPU programmers need to explicitly use stacks to emulate the recursion. Parallelizing such stack-based implementation on the GPU increases the programming difficulty; moreover, it is unclear how to improve the efficiency of such parallel implementations. As a first step to address both ease of programming and efficiency issues, we propose three parallel stack implementation alternatives that differ in the granularity of stack sharing. Taking tree traversals as an example, we study the performance tradeoffs between these alternatives and analyze their behaviors in various situations. Our results could be useful to both GPU programmers and GPU compiler writers.
- CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.Google Scholar
- A. Guttman, R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD, pp. 47--54. 1984. Google ScholarDigital Library
- S. Popov, J. Günther, S. Hans-Peter et al, Stackless KD-Tree Traversal for High Performance GPU Ray Tracing In: Computer Graphics Forum 26(3), pp. 415--424, 2007.Google ScholarCross Ref
- L. Prechelt, S. U. Hänßgen, Efficient Parallel Execution of Irregular Recursive Programs, IEEE Transactions on Parallel Distributed Systems 2002, 13(2):167--178. Google ScholarDigital Library
- B. He, K. Yang, R. Fang et al, Relational Joins on Graphics Processors, SIGMOD 2008. Google ScholarDigital Library
- K. Zhou, Q. Hou, R. Wang, B. Guo, Real-Time KD-Tree Construction on Graphics Hardware, SIGGRAPH Asia 2008. Google ScholarDigital Library
Index Terms
- Stack-based parallel recursion on graphics processors
Recommendations
Stack-based parallel recursion on graphics processors
PPoPP '09Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree ...
Relational query coprocessing on graphics processors
Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments