Skip to main content

A Skew-Insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1873))

Included in the following conference series:

Abstract

Join is an expensive and frequently used operation whose parallelization is highly desirable. However effectiveness of parallel joins depends on the ability to evenly divide load among processors. Data skew can have a disastrous effect on performance. Although many skew-handling algorithms have been proposed they remain generally inefficient in the case of multi-joins due to join product skew, costly and unnecessary redistribution and communication costs. A parallel join algorithm called fa_join has been introduced in an earlier paper with deterministic and near-perfect balancing properties. Despite its advantages, fa_join is sensitive to the correlation of the attribute value distributions in both relations. We present here an improved version of the algorithm called Sfa_join with a symmetric treatment of both relations. Its predictably low join-product and attribute-value skew makes it suitable for repeated use in multi-join operations. Its performance is analyzed theoretically and experimentally, to confirm its linear speed-up and its superiority over fa_join.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Bamha and G. Hains. A frequency adaptive join algorithm for SN machines. Journal of Parallel and Distributed Computing Practices, 2000. To appear.

    Google Scholar 

  2. M. Bamha and G. Hains. A symmetric frequency-adaptive join algorithm for shared nothing machines. Research Report RR-LIFO-2000-03, LIFO, Université d’Orléans, 2000. ftp://ftp-lifo.univ-orleans.fr/pub/RR/RR2000/RR2000-03.ps.

  3. M. Bamha and G. Hains. A self-balancing join algorithm for Shared Nothing machines. In the Proc of the 10th International Conference on Parallel and Distributed Computing Systems, Las Vegas, Nevada, October 1998.

    Google Scholar 

  4. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, and S. Seshrdri. Practical Skew Handling in Parallel Joins. In Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, 1992.

    Google Scholar 

  5. L. Harada and M. Kitsuregawa. Dynamic join product skew handling for hash-joins in shared-nothing database systems. In Fourth International Conference on Database Systems for Advanced Applications, pages 246–255, 1995.

    Google Scholar 

  6. Kian-Lee Tan Hongjun Lu. Dynamic and load-balanced task-oriented database query processing in parallel systems. In Proceedings of the 3third Conf. Extending Data Base Technology, 1992, pp. 357–372, 1992.

    Google Scholar 

  7. K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. In G. M. Lohman, A. Sernadas, and R. Camps, editors, Proc. of the 17th International Conference on Very Large Data Bases, pages 525–535, Barcelona, Catalonia, Spain, 1991. Morgan Kaufmann.

    Google Scholar 

  8. Hongjun Lu, Beng-Chin Ooi, and Kian-Lee Tan. Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, California, 1994.

    Google Scholar 

  9. H. Märtens. Skew-insensitive join processing in shared-disk database systems. Proc. of Issues and Applications of Database Technology (IADT’ 98), Berlin, 1998.

    Google Scholar 

  10. A. N. Mourad, R. J. T. Morris, A. Swami, and H. C. Young. Limits of parallelism in hash join algorithms. Performance evaluation, 20(1/3):301–316, May 1994.

    Article  Google Scholar 

  11. Viswanath Poosala and Yannis E. Ioannidis. Estimation of query-result distribution and its application in parallel-join load balancing. In: Proc. 22th Int. Conf. on Very Large Database Systems, VLDB’96, Bombay, India, 1996.

    Google Scholar 

  12. Donovan A. Schneider and David J. DeWitt. A performance of four parallel join algorithms in a shared-nothing multiprocessor environment. ACM SIGMOD, 1989.

    Google Scholar 

  13. M. Seetha and Philip S. Yu. Effectiveness of Parallel Joins, published in the IEEE, Trans. Knowledge and Data Enginneerings, Vol. 2, No 4, PP 410–424, 1990.

    Article  Google Scholar 

  14. Leslie Valiant. A Bridging Model for Parallel Computation,. Communication of the ACM, Vol 33, No. 8., August 1990.

    Google Scholar 

  15. Annita N. Wilschut, Jan Flokstra, and Peter M.G. Apers. Parallel Evaluation of Multi-join Queries. In the Proc. Of the ACM-SIGMOD, California, 1995.

    Google Scholar 

  16. G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Reading, MA, Adisson-Wesley, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bamha, M., Hains, G. (2000). A Skew-Insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines. In: Ibrahim, M., Küng, J., Revell, N. (eds) Database and Expert Systems Applications. DEXA 2000. Lecture Notes in Computer Science, vol 1873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44469-6_60

Download citation

  • DOI: https://doi.org/10.1007/3-540-44469-6_60

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67978-3

  • Online ISBN: 978-3-540-44469-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics