Abstract
We introduce a parallel kd-tree construction method for 3-dimensional points on a GPU which employs a sorting algorithm that maintains high parallelism throughout construction. Typically, large arrays in the upper levels of a kd-tree do not yield high performance when computing each node in one thread. Conversely, small arrays in the lower levels of the tree do not benefit from typical parallel sorts. To address these issues, the proposed sorting approach uses a modified parallel sort on the upper levels before switching to basic parallelization on the lower levels. Our work focuses on 3D point registration and our results indicate that a speed gain by a factor of 100 can be achieved in comparison to a naive parallel algorithm for a typical scene.










Similar content being viewed by others
References
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Atkinson, M.D., Sack, J.R., Santoro, N., Strothotte, T.: Min-max heaps and generalized priority queues. Commun. ACM 29(10), 996–1000 (1986)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2008)
Garrett, T., Radkowski, R., Sheaffer, J.: Gpu-accelerated descriptor extraction process for 3d registration in augmented reality. In: 23rd International Conference on Pattern Recognition, Cancun, Mexico (2016)
Ha, L., Kruger, L., Silva, C.: Fast four-way parallel radix sorting on GPUs. Comput. Graph. Forum 28(8), 2368–2378 (2009)
Harada, T., Howes, L.: Introduction to GPU radix sort. In: Heterogeneous Computing with OpenCL. Morgan Kaufman (2011)
Harris, M.: Maxwell: The Most Advanced CUDA GPU Ever Made. NVIDIA, Santa Clara (2014)
Havran, V.: Heuristic ray shooting algorithms. Ph.D. thesis, Czech Technical University, Czech Technical University (2001)
Hu, L., Nooshabadi, S., Ahmadi, M.: Massively parallel KD-tree construction and nearest neighbor search algorithms. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2752–2755 (2015)
Karras, T.: Maximizing parallelism in the construction of BVHs, Octrees, and k-d trees. In: Proceedings of the Fourth ACM SIGGRAPH / Eurographics Conference on High-Performance Graphics, EGGH-HPG’12, pp. 33–37 (2012)
Leite, P., Teixeira, J.M., Farias, T., Reis, B., Teichrieb, V., Kelner, J.: Nearest neighbor searches on the GPU. Int. J. Parallel Program. 40(3), 313–330 (2012)
Leite, P.J.S., Teixeira, J.M.X.N., de Farias, T.S.M.C., Teichrieb, V., Kelner, J.: Massively parallel nearest neighbor queries for dynamic point clouds on the GPU. In: 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp. 19–25 (2009)
Merrill, D.G., Grimshaw, A.S.: Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pp. 545–546 (2010)
Qiu, D., May, S., Nüchter, A.: GPU-accelerated nearest neighbor search for 3D registration. In: Proceedings of Computer Vision Systems: 7th International Conference on Computer Vision Systems, ICVS 2009, pp. 194–203. Springer, Berlin (2009)
Radkowski, R.: Object tracking with a range camera for augmented reality assembly assistance. J. Comput. Inf. Sci. Eng. 16(1), 1–8 (2016)
Radkowski, R., Garrett, T., Ingebrand, J., Wehr, D.: Trackingexpert—a versatile tracking toolbox for augmented reality. In: IDETC/CIE 2016, the ASME 2016 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, Charlotte, NC (2016)
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009)
Singh, D.P., Joshi, I., Choudhary, J.: Survey of GPU based sorting algorithms. Int. J. Parallel Program. (2017). https://doi.org/10.1007/s10766-017-0502-5
Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126:1–126:11 (2008)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Median Splitting Determination
-
I
To determine which chunk a point belongs to, we use the technique described in [7] to compute a median. In brief, we consider the width w of a chunk to be a real number \(w=\frac{N}{2^l}\), where l is the zero-indexed tree level. Therefore, given a particular index i, we can determine the chunk by \(c = \frac{i}{w}\).
-
II
Determining whether a point is a median splitting element is also necessary during the histogram calculation. That can be determined with the following criteria, as per [7].
$$\begin{aligned} splitting = {\left\{ \begin{array}{ll} {\left\lceil \frac{i}{w} \right\rceil < \frac{i+1}{w} \wedge i \ne 0}; &{} true \\ \mathrm {else}; &{} false \end{array}\right. } \end{aligned}$$ -
III
Finally, we need the ability to calculate the starting index of a chunk, excluding the splitting element. Because the chunk width is constant, the starting index \(i_s\) of a chunk c can be computed by:
$$\begin{aligned} i_s = {\left\{ \begin{array}{ll} c = 0; &{} 0 \\ i \ne 0; &{} \left\lfloor (w \cdot c) \right\rfloor + 1 \end{array}\right. } \end{aligned}$$
1.2 Experimental Data
Rights and permissions
About this article
Cite this article
Wehr, D., Radkowski, R. Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy. Int J Parallel Prog 46, 1139–1156 (2018). https://doi.org/10.1007/s10766-018-0571-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-018-0571-0