skip to main content
10.1145/2588555.2595635acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Parallel I/O aware query optimization

Published: 18 June 2014 Publication History

Abstract

New trends in storage industry suggest that in the near future a majority of the hard disk drive-based storage subsystems will be replaced by solid state drives (SSDs). Database management systems can substantially benefit from the superior I/O performance of SSDs. Although the impact of using SSD in query processing has been studied in the past, exploiting the I/O parallelism of SSDs in query processing and optimization has not received enough attention. In this paper, at first, we show why the query optimizer needs to be aware of the benefit of the I/O parallelism in solid state drives. We characterize the benefit of exploiting I/O parallelism in database scan operators in SAP SQL Anywhere and propose a novel general I/O cost model that considers the impact of device I/O queue depth in I/O cost estimation. We show that using this model, the best plans found by the optimizer would be much closer to optimal. The proposed model is implemented in SAP SQL Anywhere. This model, dynamically defined by a calibration process, summarizes the behavior of the I/O subsystem, without having any prior knowledge about the type and the number of devices which are used in the storage subsystem.

References

[1]
M. Abouzour, I. T. Bowman, P. Bumbulis, D. DeHaan, A. K. Goel, A. Nica, G. N. Paulley, and J. Smirnios. Database self-management: Taming the monster. IEEE Data Eng. Bull., 34(4):3--11, 2011.
[2]
D. Bausch, I. Petrov, and A. Buchmann. On the performance of database query processing algorithms on flash solid state disks. In Database and Expert Systems Applications (DEXA), 2011 22nd International Workshop on, pages 139--144. IEEE, 2011.
[3]
I. T. Bowman, P. Bumbulis, D. Farrar, A. K. Goel, B. Lucier, A. Nica, G. N. Paulley, J. Smirnios, and M. Young-Lai.
[4]
I. T. Bowman, P. Bumbulis, D. Farrar, A. K. Goel, B. Lucier, A. Nica, G. N. Paulley, J. Smirnios, and M. Young-Lai. Sql anywhere: An embeddable dbms. IEEE Data Eng. Bull., 30(3):29--36, 2007.
[5]
F. Chen, R. Lee, and X. Zhang. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 266--277. IEEE, 2011.
[6]
J. Cheng, D. Haderle, R. Hedges, B. Iyer, T. Messinger, C. Mohan, and Y. Wang. An efficient hybrid join algorithm: a db2 prototype. In Data Engineering, 1991. Proceedings. Seventh International Conference on, pages 171--180, Apr 1991.
[7]
J. Do and J. M. Patel. Join processing for flash ssds: remembering past lessons. In Proceedings of the Fifth International Workshop on Data Management on New Hardware, pages 1--8. ACM, 2009.
[8]
J. Do, D. Zhang, J. M. Patel, D. J. DeWitt, J. F. Naughton, and A. Halverson. Turbocharging dbms buffer pool using ssds. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1113--1124. ACM, 2011.
[9]
P. Gassner, G. M. Lohman, K. B. Schiefer, and Y. Wang. Query optimization in the ibm db2 family. IEEE Data Eng. Bull., 16(4):4--18, 1993.
[10]
G. Graefe. Volcano-an extensible and parallel query evaluation system. Knowledge and Data Engineering, IEEE Transactions on, 6(1):120--135, 1994.
[11]
J. Gray and B. Fitzgerald. Flash disk opportunity for server applications. Queue, 6(4):18--23, 2008.
[12]
W.-H. Kang, S.-W. Lee, and B. Moon. Flash-based extended cache for higher throughput and faster recovery. Proceedings of the VLDB Endowment, 5(11):1615--1626, 2012.
[13]
I. Koltsidas and S. D. Viglas. Flashing up the storage layer. Proceedings of the VLDB Endowment, 1(1):514--525, 2008.
[14]
I. Koltsidas and S. D. Viglas. Data management over flash memory. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1209--1212. ACM, 2011.
[15]
E.-M. Lee, S.-W. Lee, and S. Park. Optimizing index scans on flash memory ssds. ACM SIGMOD Record, 40(4):5--10, 2012.
[16]
S.-W. Lee, B. Moon, and C. Park. Advances in flash memory ssd technology for enterprise database applications. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 863--870. ACM, 2009.
[17]
S.-W. Lee, B. Moon, C. Park, J.-M. Kim, and S.-W. Kim. A case for flash memory ssd in enterprise database applications. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1075--1086. ACM, 2008.
[18]
X. Liu and K. Salem. Hybrid storage management for database systems. Proceedings of the VLDB Endowment, 6(8):541--552, 2013.
[19]
S. Pelley, T. F. Wenisch, and K. LeFevre. Do query optimizers need to be ssd-aware? ADMS'11, 2011.
[20]
R. Ramakrishnan and J. Gehrke. Database management systems. Osborne/McGraw-Hill, 2000.
[21]
H. Roh, S. Park, S. Kim, M. Shin, and S.-W. Lee. B+-tree index optimization by exploiting internal parallelism of flash-based solid state drives. Proceedings of the VLDB Endowment, 5(4):286--297, 2011.
[22]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pages 23--34. ACM, 1979.
[23]
D. Tsirogiannis, S. Harizopoulos, M. A. Shah, J. L. Wiener, and G. Graefe. Query processing techniques for solid state drives. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 59--72. ACM, 2009.
[24]
P. Valduriez. Join indices. ACM Trans. Database Syst., 12(2):218--246, June 1987.
[25]
Y. Wang. Db2 query parallelism: Staging and implementation. In Proceedings of the 21th International Conference on Very Large Data Bases, pages 686--691. Morgan Kaufmann Publishers Inc., 1995.
[26]
P. Yue and C. Wong. Storage cost considerations in secondary index selection. International Journal of Computer & Information Sciences, 4(4):307--327, 1975.

Cited By

View all
  • (2023)AutoML in heavily constrained applicationsThe VLDB Journal10.1007/s00778-023-00820-133:4(957-979)Online publication date: 17-Nov-2023
  • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
  • (2022)Parallelizing Git Checkout: a Case Study of I/O Parallelism2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00040(293-304)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. I/O cost model
  2. SSD
  3. access path
  4. full table scan
  5. index scan
  6. parallel I/O
  7. prefetching
  8. query optimization

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)AutoML in heavily constrained applicationsThe VLDB Journal10.1007/s00778-023-00820-133:4(957-979)Online publication date: 17-Nov-2023
  • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
  • (2022)Parallelizing Git Checkout: a Case Study of I/O Parallelism2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD55451.2022.00040(293-304)Online publication date: Nov-2022
  • (2021)External-memory Dictionaries in the Affine and PDAM ModelsACM Transactions on Parallel Computing10.1145/34706358:3(1-20)Online publication date: 20-Sep-2021
  • (2021)Evaluating List Intersection on SSDs for Parallel I/O Skipping2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00161(1823-1828)Online publication date: Apr-2021
  • (2020)An NVM SSD-Optimized Query Processing FrameworkProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412010(935-944)Online publication date: 19-Oct-2020
  • (2020)ParIS+: Data Series Indexing on Multi-core ArchitecturesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.2975180(1-1)Online publication date: 2020
  • (2019)Small Refinements to the DAM Can Have Big Consequences for Data-Structure DesignThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323210(265-274)Online publication date: 17-Jun-2019
  • (2019)Characterization of a Big Data Storage Workload in the CloudProceedings of the 2019 ACM/SPEC International Conference on Performance Engineering10.1145/3297663.3310302(33-44)Online publication date: 4-Apr-2019
  • (2018)ParIS: The Next Destination for Fast Data Series Indexing and Query Answering2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622293(791-800)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media