Skip to main content
Log in

Revisiting hash join on graphics processors: a decade later

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Over the last decade, significant research effort has been put into improving the performance of hash join operation on GPUs. Over the same period, there have been significant changes to the GPU architecture. Hence in this paper, we first revisit the major GPU hash join implementations in the last decade and detail how they take advantage of different GPU architecture features. We then perform a comprehensive performance evaluation of these implementations using different generations of GPUs released over the last decade, which helps to shed light on the impact of different architecture features and to identify the factors guiding the choice of these features. We then study how data characteristics like skew and match rate impact the performance of GPU hash join implementations and propose techniques to improve the performance of existing implementations under such conditions. Finally, we perform an in-depth comparison of the performance and cost-efficiency of GPU hash join implementations against state-of-the-art CPU implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Breß, S., Schallehn, E., Geist, I.: Towards Optimization of Hybrid CPU/GPU Query Plans in Database Systems. Springer, Berlin (2013)

    Book  Google Scholar 

  2. Rauhe, H., Dees, J., Sattler, K.-U., Faerber, F.: Multi-level parallel query execution framework for CPU and GPU. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, pp. 330–343 . Springer, New York (2013)

  3. Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, ACM. p. 44 (2014)

  4. Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with gpus. Proc. VLDB Endow. 7(11), 1011–1022 (2014)

    Article  Google Scholar 

  5. Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: 2014 IEEE 30th International Conference on Data Engineering, pp. 508–519. IEEE (2014)

  6. He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524. ACM (2008)

  7. He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. Proc. VLDB Endow. 6(10), 889–900 (2013)

    Article  Google Scholar 

  8. Rui, R., Li, H., Tu, Y.-C.: Join algorithms on gpus: A revisit after seven years. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2541–2550. IEEE (2015)

  9. Rui, R., Tu, Y.-C.: Fast equi-join algorithms on gpus: Design and implementation. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 17. ACM (2017)

  10. Yabuta, M., Nguyen, A., Kato, S., Edahiro, M., Kawashima, H.: Relational joins on gpus: a closer look. IEEE Trans. Parallel Distrib. Syst. 28(9), 2663–2673 (2017)

    Article  Google Scholar 

  11. Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: Gpu join processing revisited. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware, pp. 55–62. ACM (2012)

  12. Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 107–118. IEEE (2012)

  13. He, J., Zhang, S., He, B.: In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow. 8(4), 329–340 (2014)

    Article  Google Scholar 

  14. Sioulas, P., Chrysogelos, P., Karpathiotakis, M., Appuswamy, R., Ailamaki, A.: Hardware-conscious hash-joins on gpus. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 698–709. IEEE (2019)

  15. Diamos, G.F., Wu, H., Lele, A., Wang, J.: Efficient relational algebra algorithms and data structures for gpu. Georgia Institute of Technology, Tech. Rep. (2012)

  16. Paul, J., He, B., Lu, S., Lau, C.T.: Revisiting hash join on graphics processors: a decade later. In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), pp. 294–299. IEEE (2019)

  17. Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core cpus. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 37–48. ACM (2011)

  18. Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 362–373. IEEE (2013)

  19. Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core cpus. Proc. VLDB Endow. 2(2), 1378–1389 (2009)

    Article  Google Scholar 

  20. Boncz, P.A., Manegold, S., Kersten, M.L., et al.: Database architecture optimized for the new bottleneck: memory access. VLDB 99, 54–65 (1999)

    MATH  Google Scholar 

  21. He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. 33(2), 8 (2008)

    Article  Google Scholar 

  22. Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007)

    Article  Google Scholar 

  23. Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endow. 7(1), 85–96 (2013)

    Article  Google Scholar 

  24. Candea, G., Polyzotis, N., Vingralek, R.: A scalable, predictable join operator for highly concurrent data warehouses. Proc. VLDB Endow. 2(1), 277–288 (2009)

    Article  Google Scholar 

  25. Makreshanski, D., Giannikis, G., Alonso, G., Kossmann, D.: Mqjoin: efficient shared execution of main-memory joins. Proc. VLDB Endow. 9(6), 480–491 (2016)

    Article  Google Scholar 

  26. Giannikis, G., Alonso, G., Kossmann, D.: Shareddb: killing one thousand queries with one stone. Proc. VLDB Endow. 5(6), 526–537 (2012)

    Article  Google Scholar 

  27. Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1961–1976. ACM (2016)

  28. Arumugam, S., Dobra, A., Jermaine, C.M., Pansare, N., Perez, L.: The datapath system: a data-centric analytic processing engine for large data warehouses. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 519–530. ACM (2010)

  29. Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: Qpipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 383–394. ACM (2005)

  30. Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the gpu. In: GPU Computing Gems Jade Edition. Elsevier, pp. 39–53 (2012)

  31. Pirk, H., Manegold, S., Kersten, M.: Accelerating foreign-key joins using asymmetric memory channels. In: ADMS (2011)

  32. Gregg, C., Hazelwood, K.: Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In: (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software, pp. 134–144. IEEE (2011)

  33. Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on gpu devices. Proc. VLDB Endow. 6(10), 817–828 (2013)

    Article  Google Scholar 

  34. Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow. 6(9), 709–720 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by a MoE AcRF Tier 1 grant (T1 251RES1824) and a MoE AcRF Tier 2 Grant (MOE2017-T2-1-122) in Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johns Paul.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, J., He, B., Lu, S. et al. Revisiting hash join on graphics processors: a decade later. Distrib Parallel Databases 38, 771–793 (2020). https://doi.org/10.1007/s10619-019-07280-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-019-07280-z

Keywords

Navigation