skip to main content
research-article

SH2O: Efficient Data Access for Work-Sharing Databases

Published:13 November 2023Publication History
Skip Abstract Section

Abstract

Interactive applications require processing tens to hundreds of concurrent analytical queries within tight time constraints. In such setups, where high concurrency causes contention, work-sharing databases are critical for improving scalability and for bounding the increase in response time. However, as such databases share data access using full scans and expensive shared filters, they suffer from a data-access bottleneck that jeopardizes interactivity.

We present SH2O: a novel data-access operator that addresses the data-access bottleneck of work-sharing databases. SH2O is based on the idea that an access pattern based on judiciously selected multidimensional ranges can replace a set of shared filters. To exploit the idea in an efficient and scalable manner, SH2O uses a three-tier approach: i) it uses spatial indices to efficiently access the ranges without overfetching, ii) it uses an optimizer to choose which filters to replace such that it maximizes cost-benefit for index accesses, and iii) it exploits partitioning schemes and independently accesses each data partition to reduce the number of filters in the access pattern. Furthermore, we propose a tuning strategy that chooses a partitioning and indexing scheme that minimizes SH2O's cost for a target workload. Our evaluation shows a speedup of 1.8-22.2 for batches of hundreds of data-access-bound queries.

References

  1. Daniel J. Abadi, Samuel R. Madden, and Nabil Hachem. 2008. Column-Stores vs. Row-Stores: How Different Are They Really?. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 967--980. https://doi.org/10.1145/1376616.1376712Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, and Vivek R Narasayya. 2006. AutoAdmin: Self-Tuning Database SystemsTechnology. IEEE Data Eng. Bull. , Vol. 29, 3 (2006), 7--15.Google ScholarGoogle Scholar
  3. Subi Arumugam, Alin Dobra, Christopher M Jermaine, Niketan Pansare, and Luis Perez. 2010. The DataPath system: a data-centric analytic processing engine for large data warehouses. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 519--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, Vol. 18, 9 (sep 1975), 509--517. https://doi.org/10.1145/361002.361007Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. George Candea, Neoklis Polyzotis, and Radek Vingralek. 2009. A scalable, predictable join operator for highly concurrent data warehouses. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB).Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, Roee Aharon Ebenstein, Nikita Mikhaylin, Hung ching Lee, Xiaoyan Zhao, Guanzhong Xu, Luis Antonio Perez, Farhad Shahmohammadi, Tran Bui, Neil McKay, Vera Lychagina, and Brett Elliott. 2019. Procella: Unifying serving and analytical data at YouTube. PVLDB , Vol. 12(12) (2019), 2022--2034. https://dl.acm.org/citation.cfm?id=3360438Google ScholarGoogle Scholar
  7. Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity Estimation for Range Predicates Using Lightweight Models. Proc. VLDB Endow. , Vol. 12, 9 (May 2019), 1044--1057. https://doi.org/10.14778/3329772.3329780Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peter M. Fischer and Donald Kossmann. 2005. Batched Processing for Information Filters. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 902--913. https://doi.org/10.1109/ICDE.2005.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Georgios Giannikis. 2014. Work Sharing Data Processing Systems. Ph.,D. Dissertation. ETH Zurich, Zü rich, Switzerland. https://doi.org/10.3929/ethz-a-010265242Google ScholarGoogle ScholarCross RefCross Ref
  10. Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2012. SharedDB: killing one thousand queries with one stone. arXiv preprint arXiv:1203.0056 (2012).Google ScholarGoogle Scholar
  11. Goetz Graefe. 2009. Fast loads and fast queries. In International Conference on Data Warehousing and Knowledge Discovery. Springer, 111--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD '84). Association for Computing Machinery, New York, NY, USA, 47--57. https://doi.org/10.1145/602259.602266Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. 2005. Qpipe: A simultaneously pipelined relational query engine. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 383--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stratos Idreos, F. Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. , Vol. 35 (01 2012).Google ScholarGoogle Scholar
  15. Panos Kalnis, Nikos Mamoulis, and Dimitris Papadias. 2002. View selection using randomized search. Data & Knowledge Engineering , Vol. 42, 1 (2002), 89--111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Srikanth Kandula, Laurel Orr, and Surajit Chaudhuri. 2019. Pushing Data-Induced Predicates through Joins in Big-Data Clusters. Proc. VLDB Endow. , Vol. 13, 3 (nov 2019), 252--265. https://doi.org/10.14778/3368289.3368292Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Donghe Kang, Ruochen Jiang, and Spyros Blanas. 2021. Jigsaw: A data storage and query processing engine for irregular table partitioning. In Proceedings of the 2021 International Conference on Management of Data. 898--911.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michael S Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access path selection in main-memory optimized data systems: Should I scan or should I probe?. In Proceedings of the 2017 ACM International Conference on Management of Data. 715--730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Donald Kossmann and Konrad Stocker. 2000. Iterative Dynamic Programming: A New Class of Query Optimization Algorithms. ACM Trans. Database Syst. , Vol. 25, 1 (mar 2000), 43--82. https://doi.org/10.1145/352958.352982Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jonathan K. Lawder and Peter J. H. King. 2000. Using Space-Filling Curves for Multi-Dimensional Indexing. In Proceedings of the 17th British National Conferenc on Databases: Advances in Databases (BNCOD 17). Springer-Verlag, Berlin, Heidelberg, 20--35.Google ScholarGoogle Scholar
  21. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment , Vol. 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD '02). ACM, New York, NY, USA, 49--60. https://doi.org/10.1145/564691.564698Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2016. MQJoin: Efficient Shared Execution of Main-memory Joins. Proc. VLDB Endow. , Vol. 9, 6 (Jan. 2016), 480--491. https://doi.org/10.14778/2904121.2904124Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 476--487.Google ScholarGoogle Scholar
  25. Der Technischen Universität München and Volker Markl. 1999. MISTRAL: Processing Relational Queries using a Multidimensional Access Technique.Google ScholarGoogle Scholar
  26. Patrick E. O'Neil, Elizabeth J. O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In TPCTC. 237--252.Google ScholarGoogle Scholar
  27. Apache Pinot. 2023. https://pinot.apache.org/.Google ScholarGoogle Scholar
  28. Lin Qiao, Vijayshankar Raman, Frederick Reiss, Peter J. Haas, and Guy M. Lohman. 2008. Main-Memory Scan Sharing for Multi-Core CPUs. Proc. VLDB Endow. , Vol. 1, 1 (aug 2008), 610--621. https://doi.org/10.14778/1453856.1453924Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mark Raasveldt and Hannes Mühleisen. 2019. Duckdb: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data. 1981--1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robin Rehrmann, Carsten Binnig, Alexander Böhm, Kihong Kim, Wolfgang Lehner, and Amr Rizk. 2018. OLTPshare: The Case for Sharing in OLTP Workloads. Proc. VLDB Endow. , Vol. 11, 12 (aug 2018), 1769--1780. https://doi.org/10.14778/3229863.3229866Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nicholas Roussopoulos. 1982. View indexing in relational databases. ACM Transactions on Database Systems (TODS) , Vol. 7, 2 (1982), 258--290.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813. https://doi.org/10.1109/ICDE.2019.00196Google ScholarGoogle ScholarCross RefCross Ref
  33. Panagiotis Sioulas and Anastasia Ailamaki. 2021. Scalable Multi-Query Execution using Reinforcement Learning. In Proceedings of the 2021 International Conference on Management of Data. 1651--1663.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liwen Sun, Michael J Franklin, Sanjay Krishnan, and Reynold S Xin. 2014. Fine-grained partitioning for aggressive data skipping. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1115--1126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liwen Sun, Michael J Franklin, Jiannan Wang, and Eugene Wu. 2016. Skipping-oriented partitioning for columnar layouts. Proceedings of the VLDB Endowment , Vol. 10, 4 (2016), 421--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. 2009. Predictable Performance for Unpredictable Workloads. Proc. VLDB Endow. , Vol. 2, 1 (aug 2009), 706--717. https://doi.org/10.14778/1687627.1687707Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 193--208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. , Vol. 13, 3 (nov 2019), 279--292. https://doi.org/10.14778/3368289.3368294Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jingren Zhou, Per-Ake Larson, Jonathan Goldstein, and Luping Ding. 2007. Dynamic materialized views. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 526--535.Google ScholarGoogle ScholarCross RefCross Ref
  40. Marcin Zukowski, Sándor Héman, Niels Nes, and Peter Boncz. 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 723--734. ioGoogle ScholarGoogle Scholar

Index Terms

  1. SH2O: Efficient Data Access for Work-Sharing Databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Management of Data
        Proceedings of the ACM on Management of Data  Volume 1, Issue 3
        PACMMOD
        September 2023
        472 pages
        EISSN:2836-6573
        DOI:10.1145/3632968
        Issue’s Table of Contents

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 November 2023
        Published in pacmmod Volume 1, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)59
        • Downloads (Last 6 weeks)17

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader