Abstract
The B\(^+\)-tree is an important index in the fields of data warehousing and database management systems. With the development of new hardware technologies, the B\(^+\)-tree needs to be revisited to fully take advantage of hardware resources. In this paper, we focus on optimization techniques to increase the searching performance of B\(^+\)-trees on the coupled CPU-GPU architecture. First, we propose a hierarchical searching approach on the single coupled GPU to efficiently deal with leaf nodes of B\(^+\)-trees. It adopts a flexible strategy to determine the number of work items in a work group to search one key in order to reduce irregular memory accesses and divergent branches in the work group. Second, we present a co-processing pipeline method on the coupled architecture. The CPU and the integrated GPU process the sorting and searching tasks simultaneously to hide sorting and partial searching latencies. A distribution model is designed to support the workload balance strategy based on real-time performance. Our performance study shows that the hierarchical searching scheme provides an improvement up to 36% on the GPU compared to the baseline algorithm with fixed number of work items and the co-processing pipeline method further increases the throughput by a factor of 1.8. To the best of our knowledge, this paper is the first study to consider both the CPU and the coupled GPU to optimize B\(^+\)-trees searches.
Supported by the National Key R&D Program of China (No. 2017YFC0804004), and a grant from the Capital Science and Technology Innovation Vouchers of China.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Awad, M.A., Ashkiani, S., Johnson, R., Farach-Colton, M., Owens, J.D.: Engineering a high-performance GPU B-Tree. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. pp. 145–157. ACM (2019)
Chen, L., Huo, X., Agrawal, G.: Accelerating MapReduce on a coupled CPU-GPU architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 25:1–25:11. IEEE (2012)
Comer, D.: The ubiquitous B-tree. ACM Comput. Surv. 11(2), 121–137 (1979)
Daga, M., Nutter, M.: Exploiting coarse-grained parallelism in B+ tree searches on an APU. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. pp. 240–247. IEEE (2012)
Daga, M., Nutter, M., Meswani, M.: Efficient breadth-first search on a heterogeneous processor. In: 2014 IEEE International Conference on Big Data. pp. 373–382. IEEE (2015)
Fix, J., Wilkes, A., Skadron, K.: Accelerating braided B+ tree searches on a GPU with CUDA. In: Proceedings of the 2nd Workshop on Applications for Multi and Many Core Processors: Analysis, Implementation, and Performance. (2011)
Graefe, G., Kuno, H.: Modern B-tree techniques. In: 2011 IEEE 27th International Conference on Data Engineering. pp. 1370–1373. IEEE (2011)
He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proceedings of the VLDB Endowment 6(10), 889–900 (2013)
He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment 8(4), 329–340 (2014)
Helluy, P.: A portable implementation of the radix sort algorithm in OpenCL (2011), https://hal.archives-ouvertes.fr/hal-00596730
Kaczmarski, K.: Experimental B+-tree for GPU. In: Proceedings II of the 15th East-European Conference on Advances in Databases and Information Systems. pp. 232–241 (2011)
Levandoski, J.J., Lomet, D.B., Sengupta, S.: The Bw-tree: a B-tree for new hardware platforms. In: 2013 IEEE 29th International Conference on Data Engineering. pp. 302–313. IEEE (2013)
Luan, H., Chang, L.: An evaluation of analytical queries on CPUs and coupled GPUs. Concurrency and Computation: Practice and Experience 29(5), e3982 (2017)
Ramakrishnan, R., Gehrke, J.: Database management systems. 3rd edn. McGraw-Hill(2002)
Sewall, J., Chhugani, J., Kim, C., Satish, N., Dubey, P.: PALM: parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proceedings of the VLDB Endowment 4(11), 795–806 (2011)
Shahvarani, A., Jacobsen, H.A.: A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms. In: Proceedings of the 2016 International Conference on Management of Data. pp. 1523–1538. ACM (2016)
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12(3), 66–73 (2010)
Yan, Z., Lin, Y., Peng, L., Zhang, W.: Harmonia: a high throughput B+tree for GPUs. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. pp. 133–144. ACM (2019)
Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. Proceedings of the VLDB Endowment 6(10), 817–828 (2013)
Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Transactions on Parallel and Distributed Systems 28(3), 905–918 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, H., Luan, H. (2020). Optimizing B\(^+\)-Tree Searches on Coupled CPU-GPU Architectures. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-60245-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60244-4
Online ISBN: 978-3-030-60245-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)