Skip to main content
Log in

Comprehensive and Efficient Workload Summarization

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

This work studies the problem of constructing a representative workload from a given input analytical query workload where the former serves as an approximation with guarantees of the latter. We discuss our work in the context of workload analysis and monitoring. As an example, evolving system usage patterns in a database system can cause load imbalance and performance regressions which can be controlled by monitoring system usage patterns, i.e., a representative workload, over time. To construct such a workload in a principled manner, we formalize the notions of workload representativity and coverage. These metrics capture the intuition that the distribution of features in a compressed workload should match a target distribution, increasing representativity, and include common queries as well as outliers, increasing coverage. We show that solving this problem optimally is computationally hard and present a novel greedy algorithm that provides approximation guarantees. We compare our techniques to established algorithms in this problem space such as sampling and clustering, and demonstrate advantages and key trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. K‑medoids is an iterative greedy algorithm that chooses \(k\) cluster centers, assigns all points to the closest center and iteratively refines the points in each cluster.

  2. Hierarchical clustering is a top-down approach where all points start in a single cluster and the algorithm recursively splits the points into \(k\) disjoint clusters.

  3. A small sample is chosen to make sure that clustering algorithms can finish execution.

References

  1. (2020) SQL Server execution statistics. https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql?view=sql-server-ver15. Accessed 17 Nov 2020

  2. (2020) TPC‑H Benchmark. http://www.tpc.org/tpch. Accessed 17 Nov 2020

  3. Agrawal S, Chaudhuri S, Kollar L, Marathe A, Narasayya V, Syamala M (2005) Database tuning advisor for microsoft sql server 2005. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 930–932

    Chapter  Google Scholar 

  4. Chaudhuri S, Gupta Kumar A, Narasayya V (2002) Compressing sql workloads. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM, pp 488–499

    Chapter  Google Scholar 

  5. Chaudhuri S, Narasayya V, Ganesan P (2003) Primitives for workload summarization and implications for SQL. In: Proceedings 2003 VLDB Conference. Elsevier, pp 730–741

    Chapter  Google Scholar 

  6. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing, pp 143–154

    Chapter  Google Scholar 

  7. Shaleen Deep, Gruenheid A, Paraschos Koutris, Naughton J, Viglas S (2020) Comprehensive and efficient workload compression. https://arxiv.org/abs/2011.05549. Accessed 17 Nov 2020

  8. DeWitt DJ (1993) The wisconsin benchmark: past, present, and future

    Google Scholar 

  9. Jain S, Howe B (2019) Query2Vec: NLP meets databases for generalized workload Analytics. CIDR.

    Google Scholar 

  10. Krause A, Guestrin C (2005) A note on the budgeted maximization of submodular functions. Carnegie Mellon University. Center for Automated Learning and Discovery

    Google Scholar 

  11. Kul G, Luong D, Xie T, Coonan P, Chandola V, Kennedy O, Upadhyaya S (2016) Ettu: Analyzing query intents in corporate databases. In: International World Wide Web Conferences Steering Committee (ed) Proceedings of the 25th International Conference Companion on World Wide Web, pp 463–466

    Google Scholar 

  12. Kul G, Luong D, Xie T, Coonan P, Chandola V, Kennedy O, Upadhyaya S (2016) Summarizing large query logs in ettu. arXiv preprint arXiv:1608.01013

    Google Scholar 

  13. Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 420–429

    Chapter  Google Scholar 

  14. Macke S, Yiming Z, Silu H, Parameswaran A (2018) Adaptive sampling for rapidly matching histograms. Proc Vldb Endow 11(10):1262–1275

    Article  Google Scholar 

  15. O’Neil PE, O’Neil EJ, Chen X (2007) The star schema benchmark (SSB)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaleen Deep.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the article.

Additional information

This work was done while the authors Gruenheid, Viglas and Naughton were employed at Google.

Rights and permissions

Springer Nature oder sein Lizenzgeber (z.B. eine Gesellschaft oder ein*e andere*r Vertragspartner*in) hält die ausschließlichen Nutzungsrechte an diesem Artikel kraft eines Verlagsvertrags mit dem/den Autor*in(nen) oder anderen Rechteinhaber*in(nen); die Selbstarchivierung der akzeptierten Manuskriptversion dieses Artikels durch Autor*in(nen) unterliegt ausschließlich den Bedingungen dieses Verlagsvertrags und dem geltenden Recht.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deep, S., Gruenheid, A., Koutris, P. et al. Comprehensive and Efficient Workload Summarization. Datenbank Spektrum 22, 249–256 (2022). https://doi.org/10.1007/s13222-022-00427-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-022-00427-w

Keywords

Navigation