Skip to main content

Benchmarking SQL-on-Hadoop Systems: TPC or Not TPC?

  • Conference paper
  • First Online:
Book cover Big Data Benchmarking (WBDB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8991))

Included in the following conference series:

Abstract

Benchmarks are important tools to evaluate systems, as long as their results are transparent, reproducible and they are conducted with due diligence. Today, many SQL-on-Hadoop vendors use the data generators and the queries of existing TPC benchmarks, but fail to adhere to the rules, producing results that are not transparent. As the SQL-on-Hadoop movement continues to gain more traction, it is important to bring some order to this “wild west” of benchmarking. First, new rules and policies should be defined to satisfy the demands of the new generation SQL systems. The new benchmark evaluation schemes should be inexpensive, effective and open enough to embrace the variety of SQL-on-Hadoop systems and their corresponding vendors. Second, adhering to the new standards requires industry commitment and collaboration. In this paper, we discuss the problems we observe in the current practices of benchmarking, and present our proposal for bringing standardization in the SQL-on-Hadoop space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tpc.org/information/about/documentation/tpc_policies_v6.0.htm#_Toc367096059.

  2. 2.

    http://en.wikipedia.org/wiki/David_DeWitt.

  3. 3.

    http://www.saphana.com/community/blogs/blog/2013/09/16/does-the-world-need-a-new-benchmark.

References

  1. AMPLAB Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark/

  2. Apache Hive. http://hive.apache.org/

  3. IBM InfoSphere BigInsights. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.product.doc/doc/whatsnew.html

  4. Chen, Y., Raab, F., Katz, R.: From TPC-C to big data benchmarks: a functional workload model. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 28–43. Springer, Heidelberg (2014)

    Google Scholar 

  5. Cloudera Impala. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html

  6. Cloudera Impala Technical Deep Dive. http://www.slideshare.net/huguk/hug-london2013

  7. Costley, J., Lankford, P.: Big Data Cases in Banking and Securities (2014). https://stacresearch.com/news/2014/05/30/big-data-use-cases-banking-and-securities

  8. DeWitt, D.J., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: ACM SIGMOD, pp. 1255–1266 (2013)

    Google Scholar 

  9. TPC Express. http://www.tpc.org/tpctc/tpctc2013/slides_and_papers/004.pdf

  10. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: ACM SIGMOD, pp. 1197–1208 (2013)

    Google Scholar 

  11. Gray, J. (ed.): The Benchmark Handbook for Database and Transaction Systems, 2nd edn. Morgan Kaufmann, San Francisco (1993). http://research.microsoft.com/en-us/um/people/gray/benchmarkhandbook/chapter2.pdf

    Google Scholar 

  12. Groves, T.: The Big Deal about InfoSphere BigInsights v3.0 is Big SQL. http://www.ibmbigdatahub.com/blog/big-deal-about-infosphere-biginsights-v30-big-sql

  13. Impala TPC-DS Kit. https://github.com/cloudera/impala-tpcds-kit

  14. ORCFile in HDP 2.0. http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/

  15. Ozcan, F., Harris, S.: Blistering Fast SQL Access to Your Hadoop Data. http://www.ibmbigdatahub.com/blog/blistering-fast-sql-access-your-hadoop-datal

  16. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: ACM SIGMOD. ACM, New York (2009)

    Google Scholar 

  17. Pivotal HAWQ. http://pivotalhd.docs.gopivotal.com/getting-started/hawq.html

  18. Presto. http://prestodb.io/

  19. SPEC: Standard Performance Evaluation Corporation. http://www.spec.org/

  20. STAC: Security Technology Analysis Center. https://stacresearch.com/

  21. Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Pawluk, P., Zuzarte, C.: Queries on dates: fast yet not blind. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 497–502. ACM, New York (2011)

    Google Scholar 

  22. Transaction Processing Performance Council. http://www.tpc.org

  23. The TPC-DS Benchmark. http://www.tpc.org/tpcds/

  24. TPC-DS-like Workload on Impala (part 1). http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/

  25. TPC-DS-like Workload on Impala (part 2). http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/

  26. TPC-DS-like Workload on Impala (part 3). http://blog.cloudera.com/blog/2014/09/new-benchmarks-for-sql-on-hadoop-impala-1-4-widens-the-performance-gap/

  27. The TPC-H Benchmark. http://www.tpc.org/tpch/

  28. TPC-H Scripts for Hive. https://issues.apache.org/jira/browse/HIVE-600

  29. TPC-H Scripts for Impala. https://github.com/kj-ki/tpc-h-impala

  30. The TPCx-HS Benchmark. http://www.tpc.org/tpcx-hs/spec/tpcx-hs-specification-v1.1.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avrilia Floratou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Floratou, A., Özcan, F., Schiefer, B. (2015). Benchmarking SQL-on-Hadoop Systems: TPC or Not TPC?. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, HA. (eds) Big Data Benchmarking. WBDB 2014. Lecture Notes in Computer Science(), vol 8991. Springer, Cham. https://doi.org/10.1007/978-3-319-20233-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20233-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20232-7

  • Online ISBN: 978-3-319-20233-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics