Skip to main content
Log in

Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The hash join algorithm family is one of the leading techniques for equi-join performance evaluation. OLAP systems borrow this line of research to efficiently implement foreign key joins between dimension tables and big fact tables. From data warehouse schema and workload feature perspective, the hash join algorithm can be further simplified with multidimensional mapping, and the foreign key join algorithms can be evaluated from multiple perspectives instead of single performance perspective. In this paper, we introduce the surrogate key index oriented foreign key join as schema-conscious and OLAP workload customized design foreign key join to comprehensively evaluate how state-of-the-art join algorithms perform in OLAP workloads. Our experiments and analysis gave the following insights: (1) customized foreign key join algorithm for OLAP workload can make join performance step forward than general-purpose hash joins; (2) each join algorithm shows strong and weak performance regions dominated by the cache locality ratio of input_size/cache_size with a fine-grained micro join benchmark; (3) the simple hardware-oblivious shared hash table join outperforms complex hardware-conscious radix partitioning hash join in most benchmark cases; (4) the customized foreign key join algorithm with surrogate key index simplified the algorithm complexity for hardware accelerators and make it easy to be implemented for different hardware accelerators. Overall, we argue that improving join performance is a systematic work opposite to merely hardware-conscious algorithm optimizations, and the OLAP domain knowledge enables surrogate key index to be effective for foreign key joins in data warehousing workloads for both CPU and hardware accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

References

  1. Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of SIGMOD Conference. ACM, New York, NY, pp. 37–48. https://doi.org/10.1145/1989323.1989328 (2011)

  2. Balkesen, C., Teubner, J., Alonso, G., Ozsu, T.: Main-memory hash joins on multi-core cpus: tuning to the underlying hardware. In: Proceedings of ICDE Conference, pp. 362–373, https://doi.org/10.1109/icde.2013.6544839 (2013)

  3. Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008). https://doi.org/10.1145/1409360.1409380

    Article  Google Scholar 

  4. Stefan, S., Xiao, C., Jens, D.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of SIGMOD Conference, pp. 1961–1976 (2016)

  5. Zhang, Y., Zhou, X., Zhang, Y., Zhang, Y., Su, M., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28(4), 1061–1074 (2016)

    Article  Google Scholar 

  6. Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of DaMoN Conference, pp. 55–62 (2012)

  7. Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)

    Google Scholar 

  8. He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. In: Proceedings of VLDB Conference, vol. 6, no. 10, pp. 889–900 (2013)

    Article  Google Scholar 

  9. Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)

    Google Scholar 

  10. Halstead, R.J., Absalyamov, I., Najjar, W.A., Tsotras, V.J.: FPGA-based multithreading for in-memory hash joins. In: Proceedings of CIDR Conference (2015)

  11. Avinash, S., Roger, G., Jesús, C., Ho-Seop, K., Krishna, V., Sundaram, C., Steven, H., Rajat, A., Yen-Chen, L.: Knights landing: second-generation Intel Xeon Phi Product. IEEE Micro 36(2), 34–46 (2016)

    Article  Google Scholar 

  12. Jack, D., Wen-Fu, K., Allen, K.L., Julius, M., Anirudha, R., Lihu, R., Efraim, R., Ahmad, Y., Adi, Y.: Inside 6th-generation Intel Core: new microarchitecture code-named Skylake. IEEE Micro 37(2), 52–62 (2017)

    Article  Google Scholar 

  13. Barber, R., Lohman, G.M., Pandis, I., et al.: Memory-efficient hash joins. Proc. VLDB Endow. 8(4), 353–364 (2015)

    Article  Google Scholar 

  14. Sompolski, J., Zukowski, M., Boncz, P.A.: Vectorization vs. compilation in query execution. In: Proceedings of DaMoN Conference, pp. 33–40 (2011)

  15. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: Proceedings of CIDR Conference, pp. 225–237 (2005)

  16. Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of SIGMOD Conference, pp. 1493–1508, May. 2015, https://doi.org/10.1145/2723372.2747645 (2015)

  17. Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB 9(3), 96–107 (2015)

    Google Scholar 

  18. Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE Conference, pp. 195–206, https://doi.org/10.1109/icde.2011.5767867 (2011)

  19. Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of SIGMOD Conference, pp. 731–742 (2012)

  20. Zhang, Y., Wang, S., Lu, J.: Improving performance by creating a native join-index for OLAP. Front Comput Sci China 5(2): 236–249 (2011)

    Article  MathSciNet  Google Scholar 

  21. Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)

    Google Scholar 

  22. Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of SIGMOD Conference, pp. 967–980 (2008)

Download references

Acknowledgements

This work is supported by Nature Science foundation of China Project Nos. 61732014, 61772533 and Academy of Finland (310321).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhang, Y., Zhou, X. et al. Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads. Distrib Parallel Databases 37, 469–506 (2019). https://doi.org/10.1007/s10619-018-7226-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7226-4

Keywords

Navigation