research-article

Query processing techniques for solid state drives

Authors:

Dimitris Tsirogiannis,

Stavros Harizopoulos,

Janet L. Wiener,

Goetz GraefeAuthors Info & Claims

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 59 - 72

https://doi.org/10.1145/1559845.1559854

Published: 29 June 2009 Publication History

Abstract

Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data.

In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.

References

[1]

D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? SIGMOD, pages 967--980, 2008.

Digital Library

[2]

D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. ICDE, pages 466--475, 2007.

[3]

A. Ailamaki, D. J. DeWitt, and M. D. Hill. Data page layouts for relational databases on deep memory hierarchies. The VLDB Journal, 11(3), 2002.

Digital Library

[4]

P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper--pipelining query execution. CIDR, pages 225--237, 2005.

[5]

L. Bouganim, B. Jonsson, and P. Bonnet. uFlip: Understanding flash IO patterns. CIDR, 2009.

[6]

G. P. Copeland and S. N. Khoshafian. A decomposition storage model. SIGMOD, pages 268--279, 1985.

Digital Library

[7]

D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. SIGMOD Rec., 14(2):1--8, 1984.

Digital Library

[8]

G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993.

Digital Library

[9]

G. Graefe. The five-minute rule twenty years later, and how flash memory changes the rules. ACM Queue, pages 1--9, 2007.

Digital Library

[10]

S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. VLDB, pages 487--498, 2006.

Digital Library

[11]

S. Harizopoulos, M. A. Shah, J. Meza, and P. Ranganathan. Energy Efficiency: The New Holy Grail of Data Management Systems Research. CIDR, 2009.

[12]

A. L. Holloway and D. J. DeWitt. Read-optimized databases, in depth. Proc. VLDB Endow., 1(1):502--513, 2008.

Digital Library

[13]

J. Janukowicz, D. Reinsel, and J. Rydning. Worldwide solid state drive 2008--2012 forecast and analysis. Technical Report 212736, IDC, June 2008.

[14]

I. Koltsidas and S. D. Viglas. Flashing up the storage layer. Proc. VLDB Endow., 1(1):514--525, 2008.

Digital Library

[15]

S.-W. Lee and B. Moon. Design of flash-based DBMS: an in-page logging approach. SIGMOD, pages 55--66, 2007.

Digital Library

[16]

S.-W. Lee, B. Moon, C. Park, J.-M. Kim, and S.-W. Kim. A case for flash memory SSD in enterprise database applications. SIGMOD, pages 1075--1086, 2008.

Digital Library

[17]

Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. ICDE, 2009.

Digital Library

[18]

Z. Li and K. A. Ross. Fast joins using join indices. The VLDB Journal, 8:1--24, 1999.

Digital Library

[19]

R. Marek and E. Rahm. TID hash joins. CIKM, pages 42--49, 1994.

Digital Library

[20]

D. Myers. On the use of NAND flash memory in high-performance relational databases. MIT Msc Thesis, 2008.

[21]

S. Nath and P. B. Gibbons. Online maintenance of very large random samples on flash storage. Proc. VLDB Endow., 1(1):970--983, 2008.

Digital Library

[22]

M. Polte and J. Simsa and G. Gibson. Enabling enterprise solid state disks performance. Workshop on Integrating Solid-state Memory into the Storage Hierarchy, 2009.

[23]

K. A. Ross. Modeling the performance of algorithms on flash memory devices. DaMoN, pages 11---16, 2008.

Digital Library

[24]

M. A. Shah, S. Harizopoulos, J. L. Wiener, and G. Graefe. Fast scans and joins using flash drives. DaMoN, pages 17---24, 2008.

Digital Library

[25]

L. D. Shapiro. Join processing in database systems with large main memories. ACM Trans. Database Syst., 11(3):239--264, 1986.

Digital Library

[26]

K. Stocker, D. Kossmann, R. Braumandl, and A. Kemper. Integrating semi-join-reducers into state of the art query processors. ICDE, pages 575--584, 2001.

Digital Library

[27]

M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. VLDB, pages 553--564, 2005.

Digital Library

Cited By

Klyuchikov EPolyntsov MChizhov AMikhailova EChernishev G(2024)Hybrid Materialization in a Disk-Based Column-StoreProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632422(164-172)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632422
Firsov MPolyntsov MSmirnov KChernishev G(2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 22-Dec-2023
https://doi.org/10.1007/978-3-031-49333-1_14
BESSHO YHAYAMIZU YGODA KKITSUREGAWA M(2022)Dynamic Fault Tolerance for Multi-Node Query ProcessingIEICE Transactions on Information and Systems10.1587/transinf.2021DAP0004E105.D:5(909-919)Online publication date: 1-May-2022
https://doi.org/10.1587/transinf.2021DAP0004
Show More Cited By

Index Terms

Query processing techniques for solid state drives
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
    2. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Internal Parallelism of Flash Memory-Based Solid-State Drives

A unique merit of a solid-state drive (SSD) is its internal parallelism. In this article, we present a set of comprehensive studies on understanding and exploiting internal parallelism of SSDs. Through extensive experiments and thorough analysis, we ...
Understanding intrinsic characteristics and system implications of flash memory based solid state drives
SIGMETRICS '09

Flash Memory based Solid State Drive (SSD) has been called a "pivotal technology" that could revolutionize data storage systems. Since SSD shares a common interface with the traditional hard disk drive (HDD), both physically and logically, an effective ...
An empirical study of redundant array of independent solid-state drives (RAIS)

Solid-state drives (SSD) are popular storage media devices alongside magnetic hard disk drives (HDD). SSD flash chips are packaged in HDD form factors and SSDs are compatible with regular HDD device drivers and I/O buses. This compatibility allows easy ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

June 2009

1168 pages

ISBN:9781605585512

DOI:10.1145/1559845

Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '09

Sponsor:

SIGMOD/PODS '09: International Conference on Management of Data

June 29 - July 2, 2009

Rhode Island, Providence, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

136
Total Citations
View Citations
1,955
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Klyuchikov EPolyntsov MChizhov AMikhailova EChernishev G(2024)Hybrid Materialization in a Disk-Based Column-StoreProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632422(164-172)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632422
Firsov MPolyntsov MSmirnov KChernishev G(2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 22-Dec-2023
https://doi.org/10.1007/978-3-031-49333-1_14
BESSHO YHAYAMIZU YGODA KKITSUREGAWA M(2022)Dynamic Fault Tolerance for Multi-Node Query ProcessingIEICE Transactions on Information and Systems10.1587/transinf.2021DAP0004E105.D:5(909-919)Online publication date: 1-May-2022
https://doi.org/10.1587/transinf.2021DAP0004
Guo BYu JYang DLeng HLiao B(2022)Energy-Efficient Database Systems: A Systematic SurveyACM Computing Surveys10.1145/353822555:6(1-53)Online publication date: 7-Dec-2022
https://dl.acm.org/doi/10.1145/3538225
Wang JLin CPapakonstantinou YSwanson S(2021)Evaluating List Intersection on SSDs for Parallel I/O Skipping2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00161(1823-1828)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00161
Bonnet PBouganim LKoltsidas IViglas S(2020)System co-design and data management for flash devicesProceedings of the VLDB Endowment10.14778/3402755.34028074:12(1504-1505)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402755.3402807
Kojić NMilićev D(2020)Equilibrium of Redundancy in Relational Model for Optimized Data RetrievalIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291158032:9(1707-1721)Online publication date: 5-Aug-2020
https://dl.acm.org/doi/10.1109/TKDE.2019.2911580
Wang YKogan A(2019)Cloud-Based In-Memory Columnar Database Architecture for Continuous Audit AnalyticsJournal of Information Systems10.2308/isys-5253134:2(87-107)Online publication date: 2-Aug-2019
https://doi.org/10.2308/isys-52531
Rahiman AKarim B(2019)Continuous Media (CM) Data Stream in Flash-based Solid State Disk (SSD) Storage Server2019 2nd International Conference on Communication Engineering and Technology (ICCET)10.1109/ICCET.2019.8726912(11-16)Online publication date: Apr-2019
https://doi.org/10.1109/ICCET.2019.8726912
Badu-Marfo GFarooq BPatterson Z(2019)A Perspective on the Challenges and Opportunities for Privacy-Aware Big Transportation DataJournal of Big Data Analytics in Transportation10.1007/s42421-019-00001-z1:1(1-23)Online publication date: 4-Apr-2019
https://doi.org/10.1007/s42421-019-00001-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten