Revisiting hash join on graphics processors: a decade later

Paul, Johns; He, Bingsheng; Lu, Shengliang; Lau, Chiew Tong

doi:10.1007/s10619-019-07280-z

Revisiting hash join on graphics processors: a decade later

Published: 08 January 2020

Volume 38, pages 771–793, (2020)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Johns Paul ORCID: orcid.org/0000-0002-3473-2264¹,
Bingsheng He²,
Shengliang Lu² &
…
Chiew Tong Lau¹

471 Accesses
5 Citations
Explore all metrics

Abstract

Over the last decade, significant research effort has been put into improving the performance of hash join operation on GPUs. Over the same period, there have been significant changes to the GPU architecture. Hence in this paper, we first revisit the major GPU hash join implementations in the last decade and detail how they take advantage of different GPU architecture features. We then perform a comprehensive performance evaluation of these implementations using different generations of GPUs released over the last decade, which helps to shed light on the impact of different architecture features and to identify the factors guiding the choice of these features. We then study how data characteristics like skew and match rate impact the performance of GPU hash join implementations and propose techniques to improve the performance of existing implementations under such conditions. Finally, we perform an in-depth comparison of the performance and cost-efficiency of GPU hash join implementations against state-of-the-art CPU implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

On the impact of quantum computing technology on future developments in high-performance scientific computing

Article Open access 31 August 2017

References

Breß, S., Schallehn, E., Geist, I.: Towards Optimization of Hybrid CPU/GPU Query Plans in Database Systems. Springer, Berlin (2013)
Book Google Scholar
Rauhe, H., Dees, J., Sattler, K.-U., Faerber, F.: Multi-level parallel query execution framework for CPU and GPU. In: Proceedings of the East European Conference on Advances in Databases and Information Systems, pp. 330–343 . Springer, New York (2013)
Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, ACM. p. 44 (2014)
Wang, K., Zhang, K., Yuan, Y., Ma, S., Lee, R., Ding, X., Zhang, X.: Concurrent analytical query processing with gpus. Proc. VLDB Endow. 7(11), 1011–1022 (2014)
Article Google Scholar
Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: 2014 IEEE 30th International Conference on Data Engineering, pp. 508–519. IEEE (2014)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524. ACM (2008)
He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. Proc. VLDB Endow. 6(10), 889–900 (2013)
Article Google Scholar
Rui, R., Li, H., Tu, Y.-C.: Join algorithms on gpus: A revisit after seven years. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2541–2550. IEEE (2015)
Rui, R., Tu, Y.-C.: Fast equi-join algorithms on gpus: Design and implementation. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, p. 17. ACM (2017)
Yabuta, M., Nguyen, A., Kato, S., Edahiro, M., Kawashima, H.: Relational joins on gpus: a closer look. IEEE Trans. Parallel Distrib. Syst. 28(9), 2663–2673 (2017)
Article Google Scholar
Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: Gpu join processing revisited. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware, pp. 55–62. ACM (2012)
Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 107–118. IEEE (2012)
He, J., Zhang, S., He, B.: In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow. 8(4), 329–340 (2014)
Article Google Scholar
Sioulas, P., Chrysogelos, P., Karpathiotakis, M., Appuswamy, R., Ailamaki, A.: Hardware-conscious hash-joins on gpus. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 698–709. IEEE (2019)
Diamos, G.F., Wu, H., Lele, A., Wang, J.: Efficient relational algebra algorithms and data structures for gpu. Georgia Institute of Technology, Tech. Rep. (2012)
Paul, J., He, B., Lu, S., Lau, C.T.: Revisiting hash join on graphics processors: a decade later. In: 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), pp. 294–299. IEEE (2019)
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core cpus. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 37–48. ACM (2011)
Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 362–373. IEEE (2013)
Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core cpus. Proc. VLDB Endow. 2(2), 1378–1389 (2009)
Article Google Scholar
Boncz, P.A., Manegold, S., Kersten, M.L., et al.: Database architecture optimized for the new bottleneck: memory access. VLDB 99, 54–65 (1999)
MATH Google Scholar
He, B., Luo, Q.: Cache-oblivious databases: limitations and opportunities. ACM Trans. Database Syst. 33(2), 8 (2008)
Article Google Scholar
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007)
Article Google Scholar
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endow. 7(1), 85–96 (2013)
Article Google Scholar
Candea, G., Polyzotis, N., Vingralek, R.: A scalable, predictable join operator for highly concurrent data warehouses. Proc. VLDB Endow. 2(1), 277–288 (2009)
Article Google Scholar
Makreshanski, D., Giannikis, G., Alonso, G., Kossmann, D.: Mqjoin: efficient shared execution of main-memory joins. Proc. VLDB Endow. 9(6), 480–491 (2016)
Article Google Scholar
Giannikis, G., Alonso, G., Kossmann, D.: Shareddb: killing one thousand queries with one stone. Proc. VLDB Endow. 5(6), 526–537 (2012)
Article Google Scholar
Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1961–1976. ACM (2016)
Arumugam, S., Dobra, A., Jermaine, C.M., Pansare, N., Perez, L.: The datapath system: a data-centric analytic processing engine for large data warehouses. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 519–530. ACM (2010)
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: Qpipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 383–394. ACM (2005)
Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the gpu. In: GPU Computing Gems Jade Edition. Elsevier, pp. 39–53 (2012)
Pirk, H., Manegold, S., Kersten, M.: Accelerating foreign-key joins using asymmetric memory channels. In: ADMS (2011)
Gregg, C., Hazelwood, K.: Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In: (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software, pp. 134–144. IEEE (2011)
Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on gpu devices. Proc. VLDB Endow. 6(10), 817–828 (2013)
Article Google Scholar
Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow. 6(9), 709–720 (2013)
Article Google Scholar

Download references

Acknowledgements

This work is supported by a MoE AcRF Tier 1 grant (T1 251RES1824) and a MoE AcRF Tier 2 Grant (MOE2017-T2-1-122) in Singapore.

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Johns Paul & Chiew Tong Lau
National University of Singapore, Singapore, Singapore
Bingsheng He & Shengliang Lu

Authors

Johns Paul
View author publications
You can also search for this author in PubMed Google Scholar
Bingsheng He
View author publications
You can also search for this author in PubMed Google Scholar
Shengliang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chiew Tong Lau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johns Paul.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paul, J., He, B., Lu, S. et al. Revisiting hash join on graphics processors: a decade later. Distrib Parallel Databases 38, 771–793 (2020). https://doi.org/10.1007/s10619-019-07280-z

Download citation

Published: 08 January 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10619-019-07280-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting hash join on graphics processors: a decade later

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

On the impact of quantum computing technology on future developments in high-performance scientific computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Revisiting hash join on graphics processors: a decade later

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Shared Memory Parallelism in Modern C++ and HPX

On the impact of quantum computing technology on future developments in high-performance scientific computing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation