Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads

Zhang, Yansong; Zhang, Yu; Zhou, Xuan; Lu, Jiaheng

doi:10.1007/s10619-018-7226-4

Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads

Published: 23 May 2018

Volume 37, pages 469–506, (2019)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Yansong Zhang^1,2,
Yu Zhang³,
Xuan Zhou⁴ &
…
Jiaheng Lu⁵

358 Accesses
3 Citations
Explore all metrics

Abstract

The hash join algorithm family is one of the leading techniques for equi-join performance evaluation. OLAP systems borrow this line of research to efficiently implement foreign key joins between dimension tables and big fact tables. From data warehouse schema and workload feature perspective, the hash join algorithm can be further simplified with multidimensional mapping, and the foreign key join algorithms can be evaluated from multiple perspectives instead of single performance perspective. In this paper, we introduce the surrogate key index oriented foreign key join as schema-conscious and OLAP workload customized design foreign key join to comprehensively evaluate how state-of-the-art join algorithms perform in OLAP workloads. Our experiments and analysis gave the following insights: (1) customized foreign key join algorithm for OLAP workload can make join performance step forward than general-purpose hash joins; (2) each join algorithm shows strong and weak performance regions dominated by the cache locality ratio of input_size/cache_size with a fine-grained micro join benchmark; (3) the simple hardware-oblivious shared hash table join outperforms complex hardware-conscious radix partitioning hash join in most benchmark cases; (4) the customized foreign key join algorithm with surrogate key index simplified the algorithm complexity for hardware accelerators and make it easy to be implemented for different hardware accelerators. Overall, we argue that improving join performance is a systematic work opposite to merely hardware-conscious algorithm optimizations, and the OLAP domain knowledge enables surrogate key index to be effective for foreign key joins in data warehousing workloads for both CPU and hardware accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Massively Parallel NUMA-Aware Hash Joins

Many-query join: efficient shared execution of relational joins on modern hardware

Article 30 August 2017

Efficient local locking for massively multithreaded in-memory hash-based operators

Article Open access 11 February 2021

References

Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of SIGMOD Conference. ACM, New York, NY, pp. 37–48. https://doi.org/10.1145/1989323.1989328 (2011)
Balkesen, C., Teubner, J., Alonso, G., Ozsu, T.: Main-memory hash joins on multi-core cpus: tuning to the underlying hardware. In: Proceedings of ICDE Conference, pp. 362–373, https://doi.org/10.1109/icde.2013.6544839 (2013)
Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008). https://doi.org/10.1145/1409360.1409380
Article Google Scholar
Stefan, S., Xiao, C., Jens, D.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of SIGMOD Conference, pp. 1961–1976 (2016)
Zhang, Y., Zhou, X., Zhang, Y., Zhang, Y., Su, M., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28(4), 1061–1074 (2016)
Article Google Scholar
Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of DaMoN Conference, pp. 55–62 (2012)
Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)
Google Scholar
He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. In: Proceedings of VLDB Conference, vol. 6, no. 10, pp. 889–900 (2013)
Article Google Scholar
Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)
Google Scholar
Halstead, R.J., Absalyamov, I., Najjar, W.A., Tsotras, V.J.: FPGA-based multithreading for in-memory hash joins. In: Proceedings of CIDR Conference (2015)
Avinash, S., Roger, G., Jesús, C., Ho-Seop, K., Krishna, V., Sundaram, C., Steven, H., Rajat, A., Yen-Chen, L.: Knights landing: second-generation Intel Xeon Phi Product. IEEE Micro 36(2), 34–46 (2016)
Article Google Scholar
Jack, D., Wen-Fu, K., Allen, K.L., Julius, M., Anirudha, R., Lihu, R., Efraim, R., Ahmad, Y., Adi, Y.: Inside 6th-generation Intel Core: new microarchitecture code-named Skylake. IEEE Micro 37(2), 52–62 (2017)
Article Google Scholar
Barber, R., Lohman, G.M., Pandis, I., et al.: Memory-efficient hash joins. Proc. VLDB Endow. 8(4), 353–364 (2015)
Article Google Scholar
Sompolski, J., Zukowski, M., Boncz, P.A.: Vectorization vs. compilation in query execution. In: Proceedings of DaMoN Conference, pp. 33–40 (2011)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: Proceedings of CIDR Conference, pp. 225–237 (2005)
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of SIGMOD Conference, pp. 1493–1508, May. 2015, https://doi.org/10.1145/2723372.2747645 (2015)
Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB 9(3), 96–107 (2015)
Google Scholar
Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE Conference, pp. 195–206, https://doi.org/10.1109/icde.2011.5767867 (2011)
Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of SIGMOD Conference, pp. 731–742 (2012)
Zhang, Y., Wang, S., Lu, J.: Improving performance by creating a native join-index for OLAP. Front Comput Sci China 5(2): 236–249 (2011)
Article MathSciNet Google Scholar
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
Google Scholar
Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of SIGMOD Conference, pp. 967–980 (2008)

Download references

Acknowledgements

This work is supported by Nature Science foundation of China Project Nos. 61732014, 61772533 and Academy of Finland (310321).

Author information

Authors and Affiliations

MOE Key Laboratory of DEKE, Renmin University of China, Beijing, China
Yansong Zhang
School of Information, Renmin University of China, Beijing, China
Yansong Zhang
National Satellite Meteorological Center of China, Beijing, China
Yu Zhang
School of Data Science and Engineering, East China Normal University, Shanghai, China
Xuan Zhou
Department of Computer Science, University of Helsinki, Helsinki, Finland
Jiaheng Lu

Authors

Yansong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiaheng Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zhang, Y., Zhou, X. et al. Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads. Distrib Parallel Databases 37, 469–506 (2019). https://doi.org/10.1007/s10619-018-7226-4

Download citation

Published: 23 May 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10619-018-7226-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads

Abstract

Access this article

Similar content being viewed by others

Massively Parallel NUMA-Aware Hash Joins

Many-query join: efficient shared execution of relational joins on modern hardware

Efficient local locking for massively multithreaded in-memory hash-based operators

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads

Abstract

Access this article

Similar content being viewed by others

Massively Parallel NUMA-Aware Hash Joins

Many-query join: efficient shared execution of relational joins on modern hardware

Efficient local locking for massively multithreaded in-memory hash-based operators

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation