Abstract
Big data systems help organizations store, manipulate, and derive value from vast amounts of data. Relational database and MapReduce are the two most prominent technologies for such systems. Organizations use them to perform complex analysis on diverse and unconventional data types with fast growing data volumes. As more big data systems are deployed, the industry faces the challenge to develop representative benchmarks that can evaluate the capabilities of competing implementations. In this position paper, we argue for building future big data benchmarks using what we call a “functional workload model”. This concept draws on combined experiences from standard benchmarks, exemplified by TPC-C. The functional workload model describes the functional goals that the system must achieve, the data access patterns, the load variations over time, and the computation required to achieve the functional goals. Abstracting functional workload models from empirical studies of MapReduce deployments represents the first step towards building truly representative big data benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apache HBase, http://hbase.apache.org/
Apache Hive, http://hive.apache.org/
Apache Oozie, http://incubator.apache.org/oozie/
Apache Pig, http://pig.apache.org/
Apache Sqoop, http://sqoop.apache.org/
Gridmix, HADOOP-HOME/mapred/src/benchmarks/gridmix in Hadoop 0.21.0 onwards
Gridmix3, HADOOP-HOME/mapred/src/contrib/gridmi in Hadoop 0.21.0 onwards
Personal conversation with data scientists and cluster operators at Facebook
Sort benchmark home page, http://sortbenchmark.org/
SWIM - Statistical Workload Injector for MapReduce, http://github.com/SWIMProjectUCB/SWIM/wiki
TPC Benchmark A Standard Specification Revision 2.0 (1994), http://www.tpc.org/tpca/spec/tpca_current.pdf
TPC Benchmark B Standard Specification Revision 2.0 (1994), http://www.tpc.org/tpca/spec/tpcb_current.pdf
Anon, et al.: A measure of transaction porcessing power. Datamation (1985)
Belady, L., Richter, C.: The MCC Software Technology Program. SIGSOFT 10 (1985)
Bitton, D., DeWitt, D., Turbyfill, C.: Benchmarking database systems: A systematic approach. In: VLDB 1983 (1983)
Chen, Y., Alspaugh, S., Katz, R.: Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. In: VLDB 2012 (2012)
Cole, R., et al.: The mixed workload ch-benchmark. In: DBTest 2011 (2011)
Cooper, B., et al.: Benchmarking cloud serving systems with ycsb. In: SOCC 2010 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI 2004 (2004)
Fadika, Z., et al.: Benchmarking mapreduce implementations for application usage scenarios. In: GRID 2011 (2011)
Ferdman, M., et al.: Clearing the clouds, a study of emerging scale-out workloads on modern hardware. In: ASPLOS 2012 (2012)
Ferrari, D.: Computer systems performance evaluation. Prentice-Hall (1978)
Ghazal, A., et al.: Bigbench: towards an industry standard benchmark for big data analytics. In: SIGMOD 2013 (2013)
Gowda, B.D.: HiBench: A Representative and Comprehensive Hadoop Benchmark Suite. In: et al. (eds.) Presentations of WBDB 2012. LNCS, vol. 8163, Springer, Heidelberg (2014)
Gray, J.: The Benchmark Handbook For Database and Transaction Processing Systems - Introduction. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)
Huang, S., et al.: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In: ICDEW 2010 (2010)
Huppler, K.: The art of building a good benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 18–30. Springer, Heidelberg (2009)
Jacobson, I., et al.: Object-Oriented Software Engineering - A Use Case Driven Approach. Addison-Wesley (1992)
O’Neil, P.: A set query benchmark for large databases. In: Conference of the Computer Measurement Group 1989 (1989)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009 (2009)
Raab, F.: TPC-C - The Standard Benchmark for Online Transaction Processing (OLTP). In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)
Raab, F., Kohler, W., Shah, A.: Overview of the TPC Benchmark C: The Order-Entry Benchmark, www.tpc.org/tpcc/detail.asp
Serlin, O.: IBM, DEC disagree on DebitCredit results. FT Systems News 63 (1988)
Serlin, O.: The History of DebitCredit and the TPC. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems. Morgan Kaufmann Publishers (1993)
Turbyfill, C., Orji, C., Bitton, D.: As3ap: A comparative relational database benchmark. In: COMPCON 1989 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Y., Raab, F., Katz, R. (2014). From TPC-C to Big Data Benchmarks: A Functional Workload Model. In: Rabl, T., Poess, M., Baru, C., Jacobsen, HA. (eds) Specifying Big Data Benchmarks. WBDB WBDB 2012 2012. Lecture Notes in Computer Science, vol 8163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53974-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-53974-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53973-2
Online ISBN: 978-3-642-53974-9
eBook Packages: Computer ScienceComputer Science (R0)