Skip to main content

From TPC-C to Big Data Benchmarks: A Functional Workload Model

  • Conference paper
Book cover Specifying Big Data Benchmarks (WBDB 2012, WBDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8163))

Included in the following conference series:

Abstract

Big data systems help organizations store, manipulate, and derive value from vast amounts of data. Relational database and MapReduce are the two most prominent technologies for such systems. Organizations use them to perform complex analysis on diverse and unconventional data types with fast growing data volumes. As more big data systems are deployed, the industry faces the challenge to develop representative benchmarks that can evaluate the capabilities of competing implementations. In this position paper, we argue for building future big data benchmarks using what we call a “functional workload model”. This concept draws on combined experiences from standard benchmarks, exemplified by TPC-C. The functional workload model describes the functional goals that the system must achieve, the data access patterns, the load variations over time, and the computation required to achieve the functional goals. Abstracting functional workload models from empirical studies of MapReduce deployments represents the first step towards building truly representative big data benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apache HBase, http://hbase.apache.org/

  2. Apache Hive, http://hive.apache.org/

  3. Apache Oozie, http://incubator.apache.org/oozie/

  4. Apache Pig, http://pig.apache.org/

  5. Apache Sqoop, http://sqoop.apache.org/

  6. Gridmix, HADOOP-HOME/mapred/src/benchmarks/gridmix in Hadoop 0.21.0 onwards

    Google Scholar 

  7. Gridmix3, HADOOP-HOME/mapred/src/contrib/gridmi in Hadoop 0.21.0 onwards

    Google Scholar 

  8. Personal conversation with data scientists and cluster operators at Facebook

    Google Scholar 

  9. Sort benchmark home page, http://sortbenchmark.org/

  10. SWIM - Statistical Workload Injector for MapReduce, http://github.com/SWIMProjectUCB/SWIM/wiki

  11. TPC Benchmark A Standard Specification Revision 2.0 (1994), http://www.tpc.org/tpca/spec/tpca_current.pdf

  12. TPC Benchmark B Standard Specification Revision 2.0 (1994), http://www.tpc.org/tpca/spec/tpcb_current.pdf

  13. Anon, et al.: A measure of transaction porcessing power. Datamation (1985)

    Google Scholar 

  14. Belady, L., Richter, C.: The MCC Software Technology Program. SIGSOFT 10 (1985)

    Google Scholar 

  15. Bitton, D., DeWitt, D., Turbyfill, C.: Benchmarking database systems: A systematic approach. In: VLDB 1983 (1983)

    Google Scholar 

  16. Chen, Y., Alspaugh, S., Katz, R.: Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. In: VLDB 2012 (2012)

    Google Scholar 

  17. Cole, R., et al.: The mixed workload ch-benchmark. In: DBTest 2011 (2011)

    Google Scholar 

  18. Cooper, B., et al.: Benchmarking cloud serving systems with ycsb. In: SOCC 2010 (2010)

    Google Scholar 

  19. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI 2004 (2004)

    Google Scholar 

  20. Fadika, Z., et al.: Benchmarking mapreduce implementations for application usage scenarios. In: GRID 2011 (2011)

    Google Scholar 

  21. Ferdman, M., et al.: Clearing the clouds, a study of emerging scale-out workloads on modern hardware. In: ASPLOS 2012 (2012)

    Google Scholar 

  22. Ferrari, D.: Computer systems performance evaluation. Prentice-Hall (1978)

    Google Scholar 

  23. Ghazal, A., et al.: Bigbench: towards an industry standard benchmark for big data analytics. In: SIGMOD 2013 (2013)

    Google Scholar 

  24. Gowda, B.D.: HiBench: A Representative and Comprehensive Hadoop Benchmark Suite. In: et al. (eds.) Presentations of WBDB 2012. LNCS, vol. 8163, Springer, Heidelberg (2014)

    Google Scholar 

  25. Gray, J.: The Benchmark Handbook For Database and Transaction Processing Systems - Introduction. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  26. Huang, S., et al.: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In: ICDEW 2010 (2010)

    Google Scholar 

  27. Huppler, K.: The art of building a good benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 18–30. Springer, Heidelberg (2009)

    Google Scholar 

  28. Jacobson, I., et al.: Object-Oriented Software Engineering - A Use Case Driven Approach. Addison-Wesley (1992)

    Google Scholar 

  29. O’Neil, P.: A set query benchmark for large databases. In: Conference of the Computer Measurement Group 1989 (1989)

    Google Scholar 

  30. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009 (2009)

    Google Scholar 

  31. Raab, F.: TPC-C - The Standard Benchmark for Online Transaction Processing (OLTP). In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  32. Raab, F., Kohler, W., Shah, A.: Overview of the TPC Benchmark C: The Order-Entry Benchmark, www.tpc.org/tpcc/detail.asp

  33. Serlin, O.: IBM, DEC disagree on DebitCredit results. FT Systems News 63 (1988)

    Google Scholar 

  34. Serlin, O.: The History of DebitCredit and the TPC. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)

    Google Scholar 

  35. Turbyfill, C., Orji, C., Bitton, D.: As3ap: A comparative relational database benchmark. In: COMPCON 1989 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Y., Raab, F., Katz, R. (2014). From TPC-C to Big Data Benchmarks: A Functional Workload Model. In: Rabl, T., Poess, M., Baru, C., Jacobsen, HA. (eds) Specifying Big Data Benchmarks. WBDB WBDB 2012 2012. Lecture Notes in Computer Science, vol 8163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53974-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53974-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53973-2

  • Online ISBN: 978-3-642-53974-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics