research-article

Memory footprint matters: efficient equi-join algorithms for main memory data processing

Authors:

Jignesh M. PatelAuthors Info & Claims

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

Article No.: 19, Pages 1 - 16

https://doi.org/10.1145/2523616.2523626

Published: 01 October 2013 Publication History

Abstract

High-performance analytical data processing systems often run on servers with large amounts of main memory. A common operation in such environments is combining data from two or more sources using some "join" algorithm. The focus of this paper is on studying hash-based and sort-based equi-join algorithms when the data sets being joined fully reside in main memory. We only consider a single node setting, which is an important building block for larger high-performance distributed data processing systems. A critical contribution of this work is in pointing out that in addition to query response time, one must also consider the memory footprint of each join algorithm, as it impacts the number of concurrent queries that can be serviced. Memory footprint becomes an important deployment consideration when running analytical data processing services on hardware that is shared by other concurrent services. We also consider the impact of particular physical properties of the input and the output of each join algorithm. This information is essential for optimizing complex query pipelines with multiple joins. Our key contribution is in characterizing the properties of hash-based and sort-based equi-join algorithms, thereby allowing system implementers and query optimizers to make a more informed choice about which join algorithm to use.

References

[1]

A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, pages 266--277, 1999.

Digital Library

[2]

M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10): 1064--1075, 2012.

Digital Library

[3]

S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. L. Perez. The DataPath system: a data-centric analytic processing engine for large data warehouses. In SIGMOD, pages 519--530, 2010.

Digital Library

[4]

C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, 2013.

Digital Library

[5]

R. Barber, P. Bendel, M. Czech, O. Draese, F. Ho, N. Hrle, S. Idreos, M.-S. Kim, O. Koeth, J.-G. Lee, T. T. Li, G. M. Lohman, K. Morfonios, R. Müller, K. Murthy, I. Pandis, L. Qiao, V. Raman, R. Sidle, K. Stolze, and S. Szabo. Business analytics in (a) blink. IEEE Data Eng. Bull., 35(1): 9--14, 2012.

[6]

S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011.

Digital Library

[7]

P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54--65, 1999.

Digital Library

[8]

S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Inspector joins. In VLDB, pages 817--828, 2005.

Digital Library

[9]

S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3): 17, 2007.

Digital Library

[10]

J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, pages 25--34, 2008.

Digital Library

[11]

D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In SIGMOD, pages 1--8, 1984.

Digital Library

[12]

D. J. DeWitt, J. F. Naughton, and D. A. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. In PDIS, pages 280--291, 1991.

Digital Library

[13]

F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA database - Data management for modern business applications. SIGMOD Record, 40(4): 45--51, 2011.

Digital Library

[14]

G. Fowler, L. C. Noll, and P. Vo. FNV hash. http://www.isthe.com/chongo/tech/comp/fnv/.

[15]

G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one thousand queries with one stone. PVLDB, 5(6): 526--537, 2012.

Digital Library

[16]

G. H. Gonnet. Expected length of the longest probe sequence in hash code searching. J. ACM, 28: 289--304, April 1981.

Digital Library

[17]

G. Graefe. Encapsulation of parallelism in the Volcano query processing system. In SIGMOD, pages 102--111, 1990.

Digital Library

[18]

G. Graefe. Sort-merge-join: An idea whose time Has(h) passed? In ICDE, pages 406--417, 1994.

Digital Library

[19]

G. Graefe. Implementing sorting in database systems. ACM Comput. Surv., 38(3), 2006.

Digital Library

[20]

S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A simultaneously pipelined relational query engine. In SIGMOD, pages 383--394, 2005.

Digital Library

[21]

Intel Xeon Processor 7500 Series Uncore Programming Guide, March 2010. Reference number: 323535-001.

[22]

C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2): 1378--1389, 2009.

Digital Library

[23]

D. E. Knuth. The Art of Computer Programming, Volume III: Sorting and Searching, chapter 6.4. Addison-Wesley, 1998.

[24]

S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng., 14(4): 709--730, 2002.

Digital Library

[25]

D. R. Musser. Introspective sorting and selection algorithms. Softw., Pract. Exper., 27(8): 983--993, 1997.

Digital Library

[26]

T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9): 539--550, 2011.

Digital Library

[27]

Oracle Exalytics In-Memory Machine: A Brief Introduction, October 2011.

[28]

A. Pavlo, C. Curino, and S. B. Zdonik. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In SIGMOD, pages 61--72, 2012.

Digital Library

[29]

N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, 2010.

Digital Library

Cited By

Lee SLim CChoi JChoi HLee CPark YPark KKim HKim Y(2024)SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMsProceedings of the ACM on Management of Data10.1145/36988272:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698827
Li HJin HZheng LLiao X(2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020
https://doi.org/10.1109/TCAD.2020.3012860
Chen RPrasanna V(2016)Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2016.62(212-219)Online publication date: May-2016
https://doi.org/10.1109/FCCM.2016.62
Show More Cited By

Index Terms

Memory footprint matters: efficient equi-join algorithms for main memory data processing

Recommendations

Write-once-memory-code phase change memory
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM --- attributed to PCM SET --- by ...
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Replacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

October 2013

427 pages

ISBN:9781450324281

DOI:10.1145/2523616

General Chair:
Guy Lohman

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Microsoft Jim Gray Systems Lab
Division of Information and Intelligent Systems

Conference

SOCC '13

Sponsor:

SOCC '13: ACM Symposium on Cloud Computing

October 1 - 3, 2013

California, Santa Clara

Acceptance Rates

SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
190
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee SLim CChoi JChoi HLee CPark YPark KKim HKim Y(2024)SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMsProceedings of the ACM on Management of Data10.1145/36988272:6(1-27)Online publication date: 20-Dec-2024
Li HJin HZheng LLiao X(2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020
Chen RPrasanna V(2016)Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2016.62(212-219)Online publication date: May-2016
Yu CBoyd J(2016)FB+-tree for Big Data ManagementBig Data Research10.1016/j.bdr.2015.11.0034:C(25-36)Online publication date: 1-Jun-2016
Liu FBlanas SGhandeharizadeh SBalazinska MFreedman MBarahmand S(2015)Forecasting the cost of processing multi-join queries via hashing for main-memory databasesProceedings of the Sixth ACM Symposium on Cloud Computing10.1145/2806777.2806944(153-166)Online publication date: 27-Aug-2015
Feng ZLo EKao BXu WSellis TDavidson SIves Z(2015)ByteSliceProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2747642(31-46)Online publication date: 27-May-2015
Zhang HChen GOoi BTan KZhang M(2015)In-Memory Big Data Management and Processing: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.242779527:7(1920-1948)Online publication date: 1-Jul-2015
Sellis TDavidson SIves Z(2015)Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataundefinedOnline publication date: 27-May-2015
Blanas SWu KByna SDong BShoshani ADyreson CLi FÖzsu M(2014)Parallel data analysis directly on scientific file formatsProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2612185(385-396)Online publication date: 18-Jun-2014

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten