Skip to main content

Optimizing B\(^+\)-Tree Searches on Coupled CPU-GPU Architectures

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12452))

Abstract

The B\(^+\)-tree is an important index in the fields of data warehousing and database management systems. With the development of new hardware technologies, the B\(^+\)-tree needs to be revisited to fully take advantage of hardware resources. In this paper, we focus on optimization techniques to increase the searching performance of B\(^+\)-trees on the coupled CPU-GPU architecture. First, we propose a hierarchical searching approach on the single coupled GPU to efficiently deal with leaf nodes of B\(^+\)-trees. It adopts a flexible strategy to determine the number of work items in a work group to search one key in order to reduce irregular memory accesses and divergent branches in the work group. Second, we present a co-processing pipeline method on the coupled architecture. The CPU and the integrated GPU process the sorting and searching tasks simultaneously to hide sorting and partial searching latencies. A distribution model is designed to support the workload balance strategy based on real-time performance. Our performance study shows that the hierarchical searching scheme provides an improvement up to 36% on the GPU compared to the baseline algorithm with fixed number of work items and the co-processing pipeline method further increases the throughput by a factor of 1.8. To the best of our knowledge, this paper is the first study to consider both the CPU and the coupled GPU to optimize B\(^+\)-trees searches.

Supported by the National Key R&D Program of China (No. 2017YFC0804004), and a grant from the Capital Science and Technology Innovation Vouchers of China.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Awad, M.A., Ashkiani, S., Johnson, R., Farach-Colton, M., Owens, J.D.: Engineering a high-performance GPU B-Tree. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. pp. 145–157. ACM (2019)

    Google Scholar 

  2. Chen, L., Huo, X., Agrawal, G.: Accelerating MapReduce on a coupled CPU-GPU architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. pp. 25:1–25:11. IEEE (2012)

    Google Scholar 

  3. Comer, D.: The ubiquitous B-tree. ACM Comput. Surv. 11(2), 121–137 (1979)

    Article  MathSciNet  Google Scholar 

  4. Daga, M., Nutter, M.: Exploiting coarse-grained parallelism in B+ tree searches on an APU. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. pp. 240–247. IEEE (2012)

    Google Scholar 

  5. Daga, M., Nutter, M., Meswani, M.: Efficient breadth-first search on a heterogeneous processor. In: 2014 IEEE International Conference on Big Data. pp. 373–382. IEEE (2015)

    Google Scholar 

  6. Fix, J., Wilkes, A., Skadron, K.: Accelerating braided B+ tree searches on a GPU with CUDA. In: Proceedings of the 2nd Workshop on Applications for Multi and Many Core Processors: Analysis, Implementation, and Performance. (2011)

    Google Scholar 

  7. Graefe, G., Kuno, H.: Modern B-tree techniques. In: 2011 IEEE 27th International Conference on Data Engineering. pp. 1370–1373. IEEE (2011)

    Google Scholar 

  8. He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proceedings of the VLDB Endowment 6(10), 889–900 (2013)

    Article  Google Scholar 

  9. He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment 8(4), 329–340 (2014)

    Article  Google Scholar 

  10. Helluy, P.: A portable implementation of the radix sort algorithm in OpenCL (2011), https://hal.archives-ouvertes.fr/hal-00596730

  11. Kaczmarski, K.: Experimental B+-tree for GPU. In: Proceedings II of the 15th East-European Conference on Advances in Databases and Information Systems. pp. 232–241 (2011)

    Google Scholar 

  12. Levandoski, J.J., Lomet, D.B., Sengupta, S.: The Bw-tree: a B-tree for new hardware platforms. In: 2013 IEEE 29th International Conference on Data Engineering. pp. 302–313. IEEE (2013)

    Google Scholar 

  13. Luan, H., Chang, L.: An evaluation of analytical queries on CPUs and coupled GPUs. Concurrency and Computation: Practice and Experience 29(5), e3982 (2017)

    Article  Google Scholar 

  14. Ramakrishnan, R., Gehrke, J.: Database management systems. 3rd edn. McGraw-Hill(2002)

    Google Scholar 

  15. Sewall, J., Chhugani, J., Kim, C., Satish, N., Dubey, P.: PALM: parallel architecture-friendly latch-free modifications to B+ trees on many-core processors. Proceedings of the VLDB Endowment 4(11), 795–806 (2011)

    Article  Google Scholar 

  16. Shahvarani, A., Jacobsen, H.A.: A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms. In: Proceedings of the 2016 International Conference on Management of Data. pp. 1523–1538. ACM (2016)

    Google Scholar 

  17. Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12(3), 66–73 (2010)

    Article  Google Scholar 

  18. Yan, Z., Lin, Y., Peng, L., Zhang, W.: Harmonia: a high throughput B+tree for GPUs. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. pp. 133–144. ACM (2019)

    Google Scholar 

  19. Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. Proceedings of the VLDB Endowment 6(10), 817–828 (2013)

    Article  Google Scholar 

  20. Zhang, F., Zhai, J., He, B., Zhang, S., Chen, W.: Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Transactions on Parallel and Distributed Systems 28(3), 905–918 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Luan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, H., Luan, H. (2020). Optimizing B\(^+\)-Tree Searches on Coupled CPU-GPU Architectures. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12452. Springer, Cham. https://doi.org/10.1007/978-3-030-60245-1_28

Download citation

Publish with us

Policies and ethics