research-article

SH2O: Efficient Data Access for Work-Sharing Databases

Authors:
Panagiotis Sioulas

Oracle, Zurich, Switzerland

Oracle, Zurich, Switzerland

0000-0001-5451-5729
View Profile

,
Ioannis Mytilinis

Oracle, Zurich, Switzerland

Oracle, Zurich, Switzerland

0000-0002-9901-0721
View Profile

,
Anastasia Ailamaki

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland

0000-0002-9949-3639
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 3Article No.: 220pp 1–26https://doi.org/10.1145/3617340

Published:13 November 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Interactive applications require processing tens to hundreds of concurrent analytical queries within tight time constraints. In such setups, where high concurrency causes contention, work-sharing databases are critical for improving scalability and for bounding the increase in response time. However, as such databases share data access using full scans and expensive shared filters, they suffer from a data-access bottleneck that jeopardizes interactivity.

We present SH2O: a novel data-access operator that addresses the data-access bottleneck of work-sharing databases. SH2O is based on the idea that an access pattern based on judiciously selected multidimensional ranges can replace a set of shared filters. To exploit the idea in an efficient and scalable manner, SH2O uses a three-tier approach: i) it uses spatial indices to efficiently access the ranges without overfetching, ii) it uses an optimizer to choose which filters to replace such that it maximizes cost-benefit for index accesses, and iii) it exploits partitioning schemes and independently accesses each data partition to reduce the number of filters in the access pattern. Furthermore, we propose a tuning strategy that chooses a partitioning and indexing scheme that minimizes SH2O's cost for a target workload. Our evaluation shows a speedup of 1.8-22.2 for batches of hundreds of data-access-bound queries.

References

Daniel J. Abadi, Samuel R. Madden, and Nabil Hachem. 2008. Column-Stores vs. Row-Stores: How Different Are They Really?. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 967--980. https://doi.org/10.1145/1376616.1376712Google ScholarDigital Library
Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, and Vivek R Narasayya. 2006. AutoAdmin: Self-Tuning Database SystemsTechnology. IEEE Data Eng. Bull. , Vol. 29, 3 (2006), 7--15.Google Scholar
Subi Arumugam, Alin Dobra, Christopher M Jermaine, Niketan Pansare, and Luis Perez. 2010. The DataPath system: a data-centric analytic processing engine for large data warehouses. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 519--530.Google ScholarDigital Library
Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, Vol. 18, 9 (sep 1975), 509--517. https://doi.org/10.1145/361002.361007Google ScholarDigital Library
George Candea, Neoklis Polyzotis, and Radek Vingralek. 2009. A scalable, predictable join operator for highly concurrent data warehouses. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB).Google ScholarDigital Library
Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, Roee Aharon Ebenstein, Nikita Mikhaylin, Hung ching Lee, Xiaoyan Zhao, Guanzhong Xu, Luis Antonio Perez, Farhad Shahmohammadi, Tran Bui, Neil McKay, Vera Lychagina, and Brett Elliott. 2019. Procella: Unifying serving and analytical data at YouTube. PVLDB , Vol. 12(12) (2019), 2022--2034. https://dl.acm.org/citation.cfm?id=3360438Google Scholar
Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity Estimation for Range Predicates Using Lightweight Models. Proc. VLDB Endow. , Vol. 12, 9 (May 2019), 1044--1057. https://doi.org/10.14778/3329772.3329780Google ScholarDigital Library
Peter M. Fischer and Donald Kossmann. 2005. Batched Processing for Information Filters. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 902--913. https://doi.org/10.1109/ICDE.2005.25Google ScholarDigital Library
Georgios Giannikis. 2014. Work Sharing Data Processing Systems. Ph.,D. Dissertation. ETH Zurich, Zü rich, Switzerland. https://doi.org/10.3929/ethz-a-010265242Google ScholarCross Ref
Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2012. SharedDB: killing one thousand queries with one stone. arXiv preprint arXiv:1203.0056 (2012).Google Scholar
Goetz Graefe. 2009. Fast loads and fast queries. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 111--124.Google ScholarDigital Library
Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD '84). Association for Computing Machinery, New York, NY, USA, 47--57. https://doi.org/10.1145/602259.602266Google ScholarDigital Library
Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. 2005. Qpipe: A simultaneously pipelined relational query engine. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 383--394.Google ScholarDigital Library
Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. , Vol. 35 (01 2012).Google Scholar
Panos Kalnis, Nikos Mamoulis, and Dimitris Papadias. 2002. View selection using randomized search. Data & Knowledge Engineering , Vol. 42, 1 (2002), 89--111.Google ScholarDigital Library
Srikanth Kandula, Laurel Orr, and Surajit Chaudhuri. 2019. Pushing Data-Induced Predicates through Joins in Big-Data Clusters. Proc. VLDB Endow. , Vol. 13, 3 (nov 2019), 252--265. https://doi.org/10.14778/3368289.3368292Google ScholarDigital Library
Donghe Kang, Ruochen Jiang, and Spyros Blanas. 2021. Jigsaw: A data storage and query processing engine for irregular table partitioning. In Proceedings of the 2021 International Conference on Management of Data. 898--911.Google ScholarDigital Library
Michael S Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access path selection in main-memory optimized data systems: Should I scan or should I probe?. In Proceedings of the 2017 ACM International Conference on Management of Data. 715--730.Google ScholarDigital Library
Donald Kossmann and Konrad Stocker. 2000. Iterative Dynamic Programming: A New Class of Query Optimization Algorithms. ACM Trans. Database Syst. , Vol. 25, 1 (mar 2000), 43--82. https://doi.org/10.1145/352958.352982Google ScholarDigital Library
Jonathan K. Lawder and Peter J. H. King. 2000. Using Space-Filling Curves for Multi-Dimensional Indexing. In Proceedings of the 17th British National Conferenc on Databases: Advances in Databases (BNCOD 17). Springer-Verlag, Berlin, Heidelberg, 20--35.Google Scholar
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment , Vol. 9, 3 (2015), 204--215.Google ScholarDigital Library
Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD '02). ACM, New York, NY, USA, 49--60. https://doi.org/10.1145/564691.564698Google ScholarDigital Library
Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2016. MQJoin: Efficient Shared Execution of Main-memory Joins. Proc. VLDB Endow. , Vol. 9, 6 (Jan. 2016), 480--491. https://doi.org/10.14778/2904121.2904124Google ScholarDigital Library
Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 476--487.Google Scholar
Der Technischen Universität München and Volker Markl. 1999. MISTRAL: Processing Relational Queries using a Multidimensional Access Technique.Google Scholar
Patrick E. O'Neil, Elizabeth J. O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In TPCTC. 237--252.Google Scholar
Apache Pinot. 2023. https://pinot.apache.org/.Google Scholar
Lin Qiao, Vijayshankar Raman, Frederick Reiss, Peter J. Haas, and Guy M. Lohman. 2008. Main-Memory Scan Sharing for Multi-Core CPUs. Proc. VLDB Endow. , Vol. 1, 1 (aug 2008), 610--621. https://doi.org/10.14778/1453856.1453924Google ScholarDigital Library
Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data. 1981--1984.Google ScholarDigital Library
Robin Rehrmann, Carsten Binnig, Alexander Böhm, Kihong Kim, Wolfgang Lehner, and Amr Rizk. 2018. OLTPshare: The Case for Sharing in OLTP Workloads. Proc. VLDB Endow. , Vol. 11, 12 (aug 2018), 1769--1780. https://doi.org/10.14778/3229863.3229866Google ScholarDigital Library
Nicholas Roussopoulos. 1982. View indexing in relational databases. ACM Transactions on Database Systems (TODS) , Vol. 7, 2 (1982), 258--290.Google ScholarDigital Library
Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813. https://doi.org/10.1109/ICDE.2019.00196Google ScholarCross Ref
Panagiotis Sioulas and Anastasia Ailamaki. 2021. Scalable Multi-Query Execution using Reinforcement Learning. In Proceedings of the 2021 International Conference on Management of Data. 1651--1663.Google ScholarDigital Library
Liwen Sun, Michael J Franklin, Sanjay Krishnan, and Reynold S Xin. 2014. Fine-grained partitioning for aggressive data skipping. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1115--1126.Google ScholarDigital Library
Liwen Sun, Michael J Franklin, Jiannan Wang, and Eugene Wu. 2016. Skipping-oriented partitioning for columnar layouts. Proceedings of the VLDB Endowment , Vol. 10, 4 (2016), 421--432.Google ScholarDigital Library
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. 2009. Predictable Performance for Unpredictable Workloads. Proc. VLDB Endow. , Vol. 2, 1 (aug 2009), 706--717. https://doi.org/10.14778/1687627.1687707Google ScholarDigital Library
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 193--208.Google ScholarDigital Library
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. , Vol. 13, 3 (nov 2019), 279--292. https://doi.org/10.14778/3368289.3368294Google ScholarDigital Library
Jingren Zhou, Per-Ake Larson, Jonathan Goldstein, and Luping Ding. 2007. Dynamic materialized views. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 526--535.Google ScholarCross Ref
Marcin Zukowski, Sándor Héman, Niels Nes, and Peter Boncz. 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 723--734. ioGoogle Scholar

Index Terms

SH2O: Efficient Data Access for Work-Sharing Databases
1. Information systems
  1. Data management systems
    1. Data structures
      1. Data access methods
    2. Database management system engines
      1. Database query processing

Recommendations

An Efficient Multiversion Access Structure

An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations inserts, updates, and ...
Read More
Reactive and proactive sharing across concurrent analytical queries
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Today an ever increasing amount of data is collected and analyzed by researchers, businesses, and scientists in data warehouses (DW). In addition to the data size, the number of users and applications querying data grows exponentially. The increasing ...
Read More
Access control object-oriented databases
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 3
PACMMOD
September 2023
472 pages
EISSN:2836-6573
DOI:10.1145/3632968
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2023
Published in pacmmod Volume 1, Issue 3

Permissions
Request permissions about this article.
Request Permissions
Author Tags
analytical query processing
databases
indexing
work sharing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 59
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SH2O: Efficient Data Access for Work-Sharing Databases

Proceedings of the ACM on Management of Data

Abstract

References

Cited By

Index Terms

Recommendations

An Efficient Multiversion Access Structure

Reactive and proactive sharing across concurrent analytical queries

Access control object-oriented databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SH2O: Efficient Data Access for Work-Sharing Databases

Proceedings of the ACM on Management of Data

Abstract

References

Cited By

Index Terms

Recommendations

An Efficient Multiversion Access Structure

Reactive and proactive sharing across concurrent analytical queries

Access control object-oriented databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media