skip to main content
10.1145/2723372.2750547acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Rack-Scale In-Memory Join Processing using RDMA

Published: 27 May 2015 Publication History

Abstract

Database systems running on a cluster of machines, i.e. rack-scale databases, are a common architecture for many large databases and data appliances. As the data movement across machines is often a significant bottleneck, these systems typically use a low-latency, high-throughput network such as InfiniBand. To achieve the necessary performance, parallel join algorithms must take advantage of the primitives provided by the network to speed up data transfer.
In this paper we focus on implementing parallel in-memory joins using Remote Direct Memory Access (RDMA), a communication mechanism to transfer data directly into the memory of a remote machine. The results of this paper are, to our knowledge, the first detailed analysis of parallel hash joins using RDMA. To capture their behavior independently of the network characteristics, we develop an analytical model and test our implementation on two different types of networks. The experimental results show that the model is accurate and the resulting distributed join exhibits good performance.

References

[1]
F. N. Afrati and J. D. Ullman. Optimizing Joins in a Map-Reduce Environment. In EDBT'10, pages 99--110, 2010.
[2]
M. Albutiu, A. Kemper, and T. Neumann. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. PVLDB, 5(10):1064--1075, 2012.
[3]
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. PVLDB, 7(1):85--96, 2013.
[4]
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In ICDE'13, pages 362--373, 2013.
[5]
A. Baumann, P. Barham, P. Dagand, T. L. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS Architecture for Scalable Multicore Systems. In SOSP'09, pages 29--44, 2009.
[6]
S. Blanas, Y. Li, and J. M. Patel. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In SIGMOD'11, pages 37--48, 2011.
[7]
D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In VLDB'86, pages 228--237, 1986.
[8]
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 2(1):44--62, 1990.
[9]
A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast Remote Memory. In NSDI'14, pages 401--414, 2014.
[10]
P. W. Frey. Zero-copy network communication. PhD thesis, ETH Zurich, 2010.
[11]
P. W. Frey and G. Alonso. Minimizing the Hidden Cost of RDMA. In ICDCS'09, pages 553--560, 2009.
[12]
P. W. Frey, R. Goncalves, M. L. Kersten, and J. Teubner. Spinning Relations: High-Speed Networks for Distributed Join Processing. In DaMoN'09, pages 27--33, 2009.
[13]
P. W. Frey, R. Goncalves, M. L. Kersten, and J. Teubner. A Spinning Join That Does Not Get Dizzy. In ICDCS'10, pages 283--292, 2010.
[14]
J. Giceva, G. Alonso, T. Roscoe, and T. Harris. Deployment of Query Plans on Multicores. PVLDB, 8(3):233--244, 2014.
[15]
R. Goncalves. The Data Cyclotron - Juggling Data and Queries for a Data Warehouse Audience. PhD thesis, CWI, 2013.
[16]
R. Goncalves and M. L. Kersten. The Data Cyclotron query processing scheme. In EDBT'10, pages 75--86, 2010.
[17]
InfiniBand Trade Association. InfiniBand Architecture Specification. http://www.infinibandta.org/. Nov. 2014.
[18]
International Business Machines Corporation. IBM Netezza. http://www.netezza.com/. Nov. 2014.
[19]
C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. PVLDB, 2(2):1378--1389, 2009.
[20]
M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of Hash to Data Base Machine and Its Architecture. New Generation Comput., 1(1):63--74, 1983.
[21]
H. Lang, V. Leis, M. Albutiu, T. Neumann, and A. Kemper. Massively Parallel NUMA-Aware Hash Joins. In IMDM'13, pages 3--14, 2013.
[22]
J. Liu, W. Jiang, P. Wyckoff, D. K. Panda, D. Ashton, D. Buntinas, W. D. Gropp, and B. R. Toonen. Design and implementation of MPICH2 over infiniband with RDMA support. In IPDPS'04, 2004.
[23]
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing Main-Memory Join on Modern Hardware. IEEE Trans. Knowl. Data Eng., 14(4):709--730, 2002.
[24]
J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Technical Committee on Computer Architecture Newsletter, pages 19--25, 1995.
[25]
A. Okcan and M. Riedewald. Processing Theta-Joins using MapReduce. In SIGMOD'11, pages 949--960, 2011.
[26]
Oracle Corporation. Oracle Exadata. http://www.oracle.com/us/products/database/exadata-database-machine/. Nov. 2014.
[27]
O. Polychroniou, R. Sen, and K. A. Ross. Track Join: Distributed Joins with Minimal Network Traffic. In SIGMOD'14, pages 1483--1494, 2014.
[28]
W. Rödiger, T. Mühlbauer, P. Unterbrunner, A. Reiser, A. Kemper, and T. Neumann. Locality-Sensitive Operators for Parallel Main-Memory Database Clusters. In ICDE'14, pages 592--603, 2014.
[29]
SAP AG. SAP HANA. http://www.saphana.com/. Nov. 2014.
[30]
D. A. Schneider and D. J. DeWitt. A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. In SIGMOD'89, pages 110--121, 1989.
[31]
A. Shatdal, C. Kant, and J. F. Naughton. Cache Conscious Algorithms for Relational Query Processing. In VLDB'94, pages 510--521, 1994.

Cited By

View all
  • (2025)Nezha: An Efficient Distributed Graph Processing System on Heterogeneous HardwareProceedings of the ACM on Management of Data10.1145/37097073:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Scalable Data Management on Next-Generation Data Center NetworksScalable Data Management for Future Hardware10.1007/978-3-031-74097-8_8(199-221)Online publication date: 24-Jan-2025
  • (2024)Object-oriented Unified Encrypted Memory Management for Heterogeneous Memory ArchitecturesProceedings of the ACM on Management of Data10.1145/36549582:3(1-29)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed join
  2. distributed query processing
  3. join processing with rdma
  4. rack scale databases

Qualifiers

  • Research-article

Funding Sources

  • Oracle Labs

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)6
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Nezha: An Efficient Distributed Graph Processing System on Heterogeneous HardwareProceedings of the ACM on Management of Data10.1145/37097073:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Scalable Data Management on Next-Generation Data Center NetworksScalable Data Management for Future Hardware10.1007/978-3-031-74097-8_8(199-221)Online publication date: 24-Jan-2025
  • (2024)Object-oriented Unified Encrypted Memory Management for Heterogeneous Memory ArchitecturesProceedings of the ACM on Management of Data10.1145/36549582:3(1-29)Online publication date: 30-May-2024
  • (2024)Zero-sided RDMA: Network-driven Data Shuffling for Disaggregated Heterogeneous Cloud DBMSsProceedings of the ACM on Management of Data10.1145/36392912:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Towards Buffer Management with Tiered Main MemoryProceedings of the ACM on Management of Data10.1145/36392862:1(1-26)Online publication date: 26-Mar-2024
  • (2024)SplitFT: Fault Tolerance for Disaggregated Datacenters via Remote Memory LoggingProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629561(590-607)Online publication date: 22-Apr-2024
  • (2024)Data-centric workloads with MPI_SortJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104833(104833)Online publication date: Jan-2024
  • (2024)Optimizing LSM-based indexes for disaggregated memoryThe VLDB Journal10.1007/s00778-024-00863-y33:6(1813-1836)Online publication date: 19-Jun-2024
  • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
  • (2023)Zero-sided RDMA: Network-driven Data ShufflingProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595302(82-85)Online publication date: 18-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media