Skip to main content

Introducing Skew into the TPC-H Benchmark

  • Conference paper
Topics in Performance Evaluation, Measurement and Characterization (TPCTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7144))

Included in the following conference series:

Abstract

While uniform data distributions were a design choice for the TPC-D benchmark and its successor TPC-H, it has been universally recognized that data skew is prevalent in data warehousing. A modern benchmark should therefore provide a test bed to evaluate the ability of database engines to handle skew. This paper introduces a concrete and practical way to introduce skew in the TPC-H data model by modifying the customer and supplier tables to reflect non-uniform customer and supplier populations. The first proposal consists in defining customer and supplier populations by nation that are roughly proportional to the actual nation populations. In a second proposal, nations are divided into two groups, one with large and equal populations and the other with equal and small populations. We then experiment with the proposed skew models to show how the optimizer of a parallel system can recognize skew and potentially produce different plans depending on the presence of skew. A comparison is made between query performance with the proposed method vs. the original uniform TPC-H distributions. Finally, an approach is presented to introduce skew into TPC-H with the current query set that is compatible with the current benchmark specification rules and could be implemented today.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lakshmi, S.M., Yu, P.S.: Effect of Skew on Join Performance in Parallel Architectures. In: International Symposium on Databases in Parallel and Distributed Systems (1988)

    Google Scholar 

  2. Walton, C.B., Dale, A.G., Jenevein, R.M.: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. In: Proceedings of VLDB, pp. 537–548 (1991)

    Google Scholar 

  3. Wolf, J.L., Dias, D.M., Yu, P.S., Turek, J.: An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew. In: Proceedings of ICDE 1991 (1991)

    Google Scholar 

  4. DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. In: Proceedings of VLDB 1992, pp. 27–40 (1992)

    Google Scholar 

  5. Xu, Y., Kostamaa, P.: Efficient Outer Join Data Skew Handling in Parallel DBMS. In: Proceedings of VLDB, pp. 1390–1396 (2009)

    Google Scholar 

  6. TPC Benchmark H (Decision Support) Standard Specification Revision 2.14.0, www.tpc.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Crolotte, A., Ghazal, A. (2012). Introducing Skew into the TPC-H Benchmark. In: Nambiar, R., Poess, M. (eds) Topics in Performance Evaluation, Measurement and Characterization. TPCTC 2011. Lecture Notes in Computer Science, vol 7144. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32627-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32627-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32626-4

  • Online ISBN: 978-3-642-32627-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics