Skip to main content

An Evaluation of Multi-way Joins for Relational Database Systems

  • Conference paper
  • First Online:
Enterprise Information Systems (ICEIS 2013)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 190))

Included in the following conference series:

  • 1111 Accesses

Abstract

In database systems most join algorithms operate on only two inputs at a time. Research into joins on more than two inputs, called multi-way joins, has shown that the intermediate partitioning steps of a traditional hash join based query plan can be avoided. This decreases the amount of disk based input and output (I/Os) that the query requires. This work studies the advantages and disadvantages of implementing and using different multi-way join algorithms and their relative performance compared to traditional hash joins. Specifically, this work compares hash join with three multi-way join algorithms: hash teams, generalized hash teams and SHARP. The results of the experiments show that in some cases multi-way hash joins can provide a significant advantage over hash join but not always. The cases where hash teams and generalized hash teams have better performance is limited, and it does not make sense to implement these algorithms in a production database management system. SHARP provides enough of a performance advantage that it makes sense to implement it in a database system used for data warehousing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The TPC-H data set scale factor 100 GB was tested on the hardware but run times of many hours to days made it impractical for the tests.

References

  1. Graefe, G., Bunker, R., Cooper, S.: Hash joins and hash teams in Microsoft SQL Server. In: VLDB, pp. 86–97 (1998)

    Google Scholar 

  2. Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999)

    Google Scholar 

  3. Bizarro, P., DeWitt, D.J.: Adaptive and robust query processing with SHARP. Technical report 1562, University of Wisconsin (2006)

    Google Scholar 

  4. Viglas, S., Naughton, J., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB, pp. 285–296 (2003)

    Google Scholar 

  5. Lawrence, R.: Using slice join for efficient evaluation of multi-way joins. Data Knowl. Eng. 67(1), 118–139 (2008)

    Article  MathSciNet  Google Scholar 

  6. Graefe, G., Ewel, J., Galindo-Legaria, C.: Microsoft SQL Server 7.0 query processor. Technical report, Microsoft Corporation. http://msdn.microsoft.com/en-us/library/aa226170(SQL.70).aspx, September 1998

  7. Microsoft Corporation: Description of Service Pack 1 for SQL Server 2000. Technical report, Microsoft Corporation. http://support.microsoft.com/kb/889553, May 2001

  8. Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: ACM SIGMOD, pp. 539–552 (2008)

    Google Scholar 

  9. Henderson, M., Lawrence, R.: Are multi-way joins actually useful? In: Proceedings of the 15th International Conference on Enterprise Information Systems, ICEIS 2013. SciTePress (2013)

    Google Scholar 

  10. DeWitt, D., Katz, R., Olken, F., Shapiro, L., Stonebraker, M., Wood, D.: Implementation techniques for main memory database systems. In: ACM SIGMOD, pp. 1–8 (1984)

    Google Scholar 

  11. DeWitt, D., Naughton, J.: Dynamic memory hybrid hash join. Technical report, University of Wisconsin (1995)

    Google Scholar 

  12. Kitsuregawa, M., Nakayama, M., Takagi, M.: The effect of bucket size tuning in the dynamic hybrid GRACE hash join method. In: VLDB, pp. 257–266 (1989)

    Google Scholar 

  13. Graefe, G.: Five performance enhancements for hybrid hash join. Technical report CU-CS-606-92, University of Colorado at Boulder (1992)

    Google Scholar 

  14. TPC: TPC-H benchmark. http://www.tpc.org/tpch/

  15. Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD Conference, pp. 37–48 (2011)

    Google Scholar 

  16. Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)

    Google Scholar 

  17. Zhang, X., Chen, L., Wang, M.: Efficient multi-way theta-join processing using mapreduce. PVLDB 5(11), 1184–1195 (2012)

    MathSciNet  Google Scholar 

  18. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  19. PostgreSQL: open source relational database management system. http://www.postgresql.org/

  20. Henderson, M.: C++ source code for multi-way join algorithms. https://bitbucket.org/mikecubed/hashjoins

  21. Chaudhuri, S., Narasayya, V.: TPC-D data generation with skew. Technical report, Microsoft Research. ftp://ftp.research.microsoft.com/users/viveknar/tpcdskew

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramon Lawrence .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Henderson, M., Lawrence, R. (2014). An Evaluation of Multi-way Joins for Relational Database Systems. In: Hammoudi, S., Cordeiro, J., Maciaszek, L., Filipe, J. (eds) Enterprise Information Systems. ICEIS 2013. Lecture Notes in Business Information Processing, vol 190. Springer, Cham. https://doi.org/10.1007/978-3-319-09492-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09492-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09491-5

  • Online ISBN: 978-3-319-09492-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics