Abstract
In database systems most join algorithms operate on only two inputs at a time. Research into joins on more than two inputs, called multi-way joins, has shown that the intermediate partitioning steps of a traditional hash join based query plan can be avoided. This decreases the amount of disk based input and output (I/Os) that the query requires. This work studies the advantages and disadvantages of implementing and using different multi-way join algorithms and their relative performance compared to traditional hash joins. Specifically, this work compares hash join with three multi-way join algorithms: hash teams, generalized hash teams and SHARP. The results of the experiments show that in some cases multi-way hash joins can provide a significant advantage over hash join but not always. The cases where hash teams and generalized hash teams have better performance is limited, and it does not make sense to implement these algorithms in a production database management system. SHARP provides enough of a performance advantage that it makes sense to implement it in a database system used for data warehousing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The TPC-H data set scale factor 100Â GB was tested on the hardware but run times of many hours to days made it impractical for the tests.
References
Graefe, G., Bunker, R., Cooper, S.: Hash joins and hash teams in Microsoft SQL Server. In: VLDB, pp. 86–97 (1998)
Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999)
Bizarro, P., DeWitt, D.J.: Adaptive and robust query processing with SHARP. Technical report 1562, University of Wisconsin (2006)
Viglas, S., Naughton, J., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB, pp. 285–296 (2003)
Lawrence, R.: Using slice join for efficient evaluation of multi-way joins. Data Knowl. Eng. 67(1), 118–139 (2008)
Graefe, G., Ewel, J., Galindo-Legaria, C.: Microsoft SQL Server 7.0 query processor. Technical report, Microsoft Corporation. http://msdn.microsoft.com/en-us/library/aa226170(SQL.70).aspx, September 1998
Microsoft Corporation: Description of Service Pack 1 for SQL Server 2000. Technical report, Microsoft Corporation. http://support.microsoft.com/kb/889553, May 2001
Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: ACM SIGMOD, pp. 539–552 (2008)
Henderson, M., Lawrence, R.: Are multi-way joins actually useful? In: Proceedings of the 15th International Conference on Enterprise Information Systems, ICEIS 2013. SciTePress (2013)
DeWitt, D., Katz, R., Olken, F., Shapiro, L., Stonebraker, M., Wood, D.: Implementation techniques for main memory database systems. In: ACM SIGMOD, pp. 1–8 (1984)
DeWitt, D., Naughton, J.: Dynamic memory hybrid hash join. Technical report, University of Wisconsin (1995)
Kitsuregawa, M., Nakayama, M., Takagi, M.: The effect of bucket size tuning in the dynamic hybrid GRACE hash join method. In: VLDB, pp. 257–266 (1989)
Graefe, G.: Five performance enhancements for hybrid hash join. Technical report CU-CS-606-92, University of Colorado at Boulder (1992)
TPC: TPC-H benchmark. http://www.tpc.org/tpch/
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD Conference, pp. 37–48 (2011)
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)
Zhang, X., Chen, L., Wang, M.: Efficient multi-way theta-join processing using mapreduce. PVLDB 5(11), 1184–1195 (2012)
Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)
PostgreSQL: open source relational database management system. http://www.postgresql.org/
Henderson, M.: C++ source code for multi-way join algorithms. https://bitbucket.org/mikecubed/hashjoins
Chaudhuri, S., Narasayya, V.: TPC-D data generation with skew. Technical report, Microsoft Research. ftp://ftp.research.microsoft.com/users/viveknar/tpcdskew
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Henderson, M., Lawrence, R. (2014). An Evaluation of Multi-way Joins for Relational Database Systems. In: Hammoudi, S., Cordeiro, J., Maciaszek, L., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2013. Lecture Notes in Business Information Processing, vol 190. Springer, Cham. https://doi.org/10.1007/978-3-319-09492-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-09492-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09491-5
Online ISBN: 978-3-319-09492-2
eBook Packages: Computer ScienceComputer Science (R0)