ABSTRACT
This paper proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures. AM-Join utilizes (a) Tree-Join, a novel algorithm that scales well when the joined tables share hot keys, and (b) Broadcast-Join, the fastest-known when joining keys that are hot in only one table.
Unlike the state-of-the-art algorithms, AM-Join (a) holistically solves the join-key skew problem by achieving load balancing throughout the join execution, and (b) supports all outer-join variants without record deduplication or custom table partitioning. For the best AM-Join outer-join performance, we propose Index-Broadcast-Join (IB-Join) for Small-Large outer-joins, where one table fits in memory and the other is orders of magnitude larger. IB-Join improves on the state-of-the-art outer-join algorithms.
The proposed algorithms can be adopted in any shared-nothing architecture. We implemented a MapReduce version using Spark. Our evaluation shows the proposed algorithms execute significantly faster and scale to more skewed and orders-of-magnitude bigger tables when compared to the state-of-the-art algorithms.
Supplemental Material
- F. Afrati, N. Stasinopoulos, J. Ullman, and A. Vassilakopoulos. SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce. Information Systems, 77:129--150, 2018.Google ScholarCross Ref
- F. Afrati and J. Ullman. Optimizing Joins in a Map-Reduce Environment. In EDBT International Conference on Extending Database Technology, pages 99--110, 2010.Google ScholarDigital Library
- P. Agarwal, G. Cormode, Z. Huang, J. Phillips, Z. Wei, and K. Yi. Mergeable Summaries. TODS ACM Transactions on Database Systems, 38(4):1--28, 2013.Google ScholarDigital Library
- M.-C. Albutiu, A. Kemper, and T. Neumann. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. arXiv preprint arXiv:1207.0145, 2012.Google Scholar
- K. Alway and A. Nica. Constructing Join Histograms from Histograms with q-error Guarantees. In ACM SIGMOD International Conference on Management of Data, pages 2245--2246, 2016.Google ScholarDigital Library
- Apache Hadoop. http://hadoop.apache.org.Google Scholar
- F. Atta, S. Viglas, and S. Niazi. SAND Join - A Skew Handling Join Algorithm for Google's MapReduce Framework. In IEEE INMIC International Multitopic Conference, pages 170--175. IEEE, 2011.Google ScholarCross Ref
- C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment, 7(1):85--96, 2013.Google ScholarDigital Library
- C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In IEEE ICDE International Conference on Data Engineering, pages 362--373, 2013.Google ScholarDigital Library
- M. Bandle, J. Giceva, and T. Neumann. To Partition, or Not to Partition, That is the Join Question in a Real System. In ACM SIGMOD International Conference on Management of Data, pages 168--180, 2021.Google ScholarDigital Library
- A. Bar-Noy and S. Kipnis. Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems. Mathematical Systems Theory, 27(5):431--452, 1994.Google ScholarDigital Library
- C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-Scale In-Memory Join Processing using RDMA. In ACM SIGMOD International Conference on Management of Data, pages 1463--1475, 2015.Google ScholarDigital Library
- C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment, 10(5):517--528, 2017.Google ScholarDigital Library
- P. Bernstein, N. Goodman, E. Wong, C. Reeve, and J. Rothnie Jr. Query Processing in a System for Distributed Databases. TODS ACM Transactions on Database Systems, 6(4):602--625, 1981.Google ScholarDigital Library
- C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The End of Slow Networks: It's Time for a Redesign. Proceedings of the VLDB Endowment, 9(7):528--539, 2016.Google ScholarDigital Library
- S. Blanas, Y. Li, and J. Patel. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. In ACM SIGMOD International Conference on Management of Data, pages 37--48, 2011.Google Scholar
- S. Blanas, J. Patel, V. Ercegovac, J. Rao, E. Shekita, and Y. Tian. A Comparison of Join Algorithms for Log Processing in MapReduce. In ACM SIGMOD International Conference on Management of Data, pages 975--986, 2010.Google ScholarDigital Library
- M. Blasgen and K. Eswaran. Storage and Access in Relational Data Bases. IBM Systems Journal, 16(4):363--377, 1977.Google ScholarDigital Library
- N. Bruno, Y. Kwon, and M.-C. Wu. Advanced Join Strategies for Large-Scale Distributed Computation. Proceedings of the VLDB Endowment, 7(13):1484--1495, 2014.Google ScholarDigital Library
- R. Chen and V. Prasanna. Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform. In IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 212--219, 2016.Google ScholarCross Ref
- Z. Chen and A. Zhang. A Survey of Approximate Quantile Computation on Large-Scale Data. IEEE Access, 8:34585--34597, 2020.Google ScholarCross Ref
- L. Cheng, S. Kotoulas, T. Ward, and G. Theodoropoulos. QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory. In IEEE HPCC International Conference on High Performance Computing and Communications, pages 1519--1527. IEEE, 2013.Google Scholar
- L. Cheng, S. Kotoulas, T. Ward, and G. Theodoropoulos. Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems. In ACM CIKM International Conference on Conference on Information and Knowledge Management, pages 1399--1408, 2014.Google Scholar
- L. Cheng, I. Tachmazidis, S. Kotoulas, and G. Antoniou. Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments. Journal of Parallel and Distributed Computing, 110:2--15, 2017.Google ScholarCross Ref
- T.-Y. Cheung. A Method for Equijoin Queries in Distributed Relational Databases. IEEE TOC Transactions on Computers, 100(8):746--751, 1982.Google Scholar
- S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. In ACM SIGMOD International Conference on Management of Data, pages 63--78, 2015.Google ScholarDigital Library
- G. Cormode and P. Veselỳ. A Tight Lower Bound for Comparison-Based Quantile Summaries. In ACM PODS SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 81--93, 2020.Google ScholarDigital Library
- A. Das, J. Gehrke, and M. Riedewald. Approximate Join Processing over Data Streams. In ACM SIGMOD International Conference on Management of Data, pages 40--51, 2003.Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1):107--113, 2008.Google ScholarDigital Library
- D. DeWitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen. The Gamma Database Machine Project. IEEE TKDE Transactions on Knowledge and Data Engineering, 2(1):44--62, 1990.Google ScholarDigital Library
- D. DeWitt, J. Naughton, D. Schneider, and S. Seshadri. Practical Skew Handling in Parallel Joins. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 1992.Google Scholar
- D. DeWitt, M. Smith, and H. Boral. A Single-User Performance Evaluation of the Teradata Database Machine. In International Workshop on High Performance Transaction Systems, pages 243--276. Springer, 1987.Google Scholar
- E. Gavagsaz, A. Rezaee, and H. Javadi. Load Balancing in Join Algorithms for Skewed Data in MapReduce Systems. The Journal of Supercomputing, 75(1):228--254, 2019.Google ScholarDigital Library
- G. Graefe. Sort-Merge-Join: An Idea Whose Time Has(h) Passed? In IEEE ICDE International Conference on Data Engineering, pages 406--417. IEEE, 1994.Google Scholar
- V. Gulisano, Y. Nikolakopoulos, M. Papatriantafilou, and P. Tsigas. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join. IEEE Transactions on Big Data, 7(2):299--312, 2016.Google ScholarCross Ref
- C. Guo, H. Chen, F. Zhang, and C. Li. Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA. In ICPP International Conference on Parallel Processing, pages 1--10, 2019.Google Scholar
- M. Hassan and M. Bamha. An Efficient Parallel Algorithm for Evaluating Join Queries on Heterogeneous Distributed Systems. In IEEE HiPC International Conference on High Performance Computing, pages 350--358. IEEE, 2009.Google Scholar
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In ACM SIGMOD International Conference on Management of Data, pages 511--524, 2008.Google ScholarDigital Library
- D. Jiang, A. Tung, and G. Chen. MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE TKDE Transactions on Knowledge and Data Engineering, 23(9):1299--1311, 2010.Google ScholarCross Ref
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU Join Processing Revisited. In International Workshop on Data Management on New Hardware, pages 55--62, 2012.Google Scholar
- C. Kim, T. Kaldewey, V. Lee, E. Sedlar, A. Nguyen, N. Satish, J. Chhugani, A. D. Blas, and P. Dubey. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment, 2(2):1378--1389, 2009.Google ScholarDigital Library
- M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of Hash to Data Base Machine and its Architecture. New Generation Computing, 1(1):63--74, 1983.Google ScholarDigital Library
- M. Lakshmi and P. Yu. Effectiveness of Parallel Joins. IEEE Computer Architecture Letters, 2(04):410--424, 1990.Google Scholar
- R. L"ammel. Google's MapReduce programming model - Revisited. Science of Computer Programming, 70(1):1--30, 2008.Google ScholarCross Ref
- F. Li, S. Das, M. Syamala, and V. Narasayya. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In ACM SIGMOD International Conference on Management of Data, pages 355--370, 2016.Google ScholarDigital Library
- Q. Lin, B. Ooi, Z. Wang, and C. Yu. Scalable Distributed Stream Join Processing. In ACM SIGMOD International Conference on Management of Data, pages 811--825, 2015.Google Scholar
- J. Linn and C. Dyer. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies, 3(1):1--177, 2010.Google ScholarCross Ref
- S. Manegold, P. Boncz, and M. Kersten. Optimizing Main-Memory Join on Modern Hardware. IEEE TKDE Transactions on Knowledge and Data Engineering, 14(4):709--730, 2002.Google ScholarDigital Library
- A. Metwally, D. Agrawal, and A. E. Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams. In ICDT International Conference on Database Theory, pages 398--412. Springer, 2005.Google Scholar
- A. Metwally and C. Faloutsos. V-SMART-Join: A Scalable MapReduce Framework for All-Pair Similarity Joins of Multisets and Vectors. Proceedings of the VLDB Endowment, 5(8):704--715, 2012.Google ScholarDigital Library
- A. Nica, I. Charlesworth, and M. Panju. Analyzing Query Optimization Process: Portraits of Join Enumeration Algorithms. In IEEE ICDE International Conference on Data Engineering, pages 1301--1304. IEEE, 2012.Google Scholar
- A. Okcan and M. Riedewald. Processing Theta-Joins using MapReduce. In ACM SIGMOD International Conference on Management of Data, pages 949--960, 2011.Google ScholarDigital Library
- J. Paul, B. He, S. Lu, and C. Lau. Revisiting Hash Join on Graphics Processors: A Decade Later. Distributed and Parallel Databases, pages 1--23, 2020.Google ScholarDigital Library
- J. Paul, S. Lu, B. He, and C. Lau. MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures. In ACM SIGMOD International Conference on Management of Data, pages 1413--1425, 2021.Google Scholar
- O. Polychroniou, W. Zhang, and K. Ross. Track Join: Distributed Joins with Minimal Network Traffic. In ACM SIGMOD International Conference on Management of Data, pages 1483--1494, 2014.Google ScholarDigital Library
- O. Polychroniou, W. Zhang, and K. Ross. Distributed Joins and Data Placement for Minimal Network Traffic. TODS ACM Transactions on Database Systems, 43(3):1--45, 2018.Google ScholarDigital Library
- D. Quoc, I. Akkus, P. Bhatotia, S. Blanas, R. Chen, C. Fetzer, and T. Strufe. ApproxJoin: Approximate Distributed Joins. In ACM SoCC Symposium on Cloud Computing, pages 426--438, 2018.Google ScholarDigital Library
- W. Rödiger, S. Idicula, A. Kemper, and T. Neumann. Flow-Join: Adaptive Skew Handling for Distributed Joins over High-Speed Networks. In IEEE ICDE International Conference on Data Engineering, pages 1194--1205. IEEE, 2016.Google ScholarCross Ref
- W. Rödiger, T. Mühlbauer, A. Kemper, and T. Neumann. High-Speed Query Processing over High-Speed Networks. Proceedings of the VLDB Endowment, 9(4):228--239, 2015.Google ScholarDigital Library
- R. Rui, H. Li, and Y.-C. Tu. Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment. Proceedings of the VLDB Endowment, 14(4):708--720, 2020.Google ScholarDigital Library
- A. Salama, C. Binnig, T. Kraska, A. Scherp, and T. Ziegler. Rethinking Distributed Query Execution on High-Speed Networks. IEEE Data Engineering Bulletin, 40(1):27--37, 2017.Google Scholar
- P. Sanders, J. Speck, and J. Tr"aff. Two-Tree Algorithms for Full Bandwidth Broadcast, Reduction and Scan. Parallel Computing, 35(12):581--594, 2009.Google ScholarDigital Library
- D. Schneider and D. DeWitt. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. ACM SIGMOD Record, 18(2):110--121, 1989.Google ScholarDigital Library
- S. Schuh, X. Chen, and J. Dittrich. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In ACM SIGMOD International Conference on Management of Data, pages 1961--1976, 2016.Google Scholar
- D. Shasha and T.-L. Wang. Optimizing Equijoin Queries In Distributed Databases Where Relations Are Hash Partitioned. TODS ACM Transactions on Database Systems, 16(2):279--308, 1991.Google ScholarDigital Library
- P. Sioulas, P. Chrysogelos, M. Karpathiotakis, R. Appuswamy, and A. Ailamaki. Hardware-conscious Hash-Joins on GPUs. In IEEE ICDE International Conference on Data Engineering, pages 698--709, 2019.Google Scholar
- M. Stonebraker. The Case for Shared Nothing. IEEE Database Engineering Bulletin, 9(1):4--9, 1986.Google Scholar
- S. Suri and S. Vassilvitskii. Counting Triangles and the Curse of the Last Reducer. In WWW International Conference on World Wide Web, pages 607--614, 2011.Google Scholar
- Y. Tian, F. Özcan, T. Zou, R. Goncalves, and H. Pirahesh. Building a Hybrid Warehouse: Efficient Joins Between Data Stored in HDFS and Enterprise Warehouse. TODS ACM Transactions on Database Systems, 41(4):1--38, 2016.Google ScholarDigital Library
- A. Vitorovic, M. Elseidy, and C. Koch. Load Balancing and Skew Resilience for Parallel Joins. In IEEE ICDE International Conference on Data Engineering, pages 313--324. IEEE, 2016.Google Scholar
- Word frequency in Wikipedia (November 27, 2006). https://en.wikipedia.org/wiki/Zipf's_law.Google Scholar
- Y. Xu and P. Kostamaa. A New Algorithm for Small-Large Table Outer Joins in Parallel DBMS. In IEEE ICDE International Conference on Data Engineering, pages 1018--1024. IEEE, 2010.Google ScholarCross Ref
- Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling Data Skew in Parallel Joins in Shared-Nothing Systems. In ACM SIGMOD International Conference on Management of Data, pages 1043--1052, 2008.Google Scholar
- H.-C. Yang, A. Dasdan, R.-L. Hsiao, and D. Parker. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. In ACM SIGMOD International Conference on Management of Data, pages 1029--1040, 2007.Google ScholarDigital Library
- K. Yi and Q. Zhang. Optimal Tracking of Distributed Heavy Hitters and Quantiles. Algorithmica, 65(1):206--223, 2013.Google Scholar
- M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. HotCloud, 10(10--10):95, 2010.Google ScholarDigital Library
Index Terms
- Scaling Equi-Joins
Recommendations
Beyond equi-joins: ranking, enumeration and factorization
We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our ...
Load-balancing distributed outer joins through operator decomposition
AbstractHigh-performance data analytics largely relies on being able to efficiently execute various distributed data operators such as distributed joins. So far, large amounts of join methods have been proposed and evaluated in parallel and ...
Highlights- We analyze the performance issues of current distributed outer join approaches.
A Parallel Hash Join Algorithm for Managing Data Skew
Presents a parallel hash join algorithm that is based on the concept of hierarchicalhashing, to address the problem of data skew. The proposed algorithm splits the usualhash phase into a hash phase and an explicit transfer phase, and adds an ...
Comments