Abstract
Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications—an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Performance counters for linux, https://perf.wiki.kernel.org/index.php/Main_Page
Sort benchmark home page, http://sortbenchmark.org/
Apacible, J., Draves, R., et al.: Minutesort with flat datacenter storage. Technical report, Microsoft Research (2012)
Barroso, L., Hölzle, U.: The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture 4(1), 1–108 (2009)
Baru, C., et al.: Benchmarking big data systems and the bigdata top100 list. Big Data 1(1), 60–64 (2013)
Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Setting the direction for big data benchmark standards. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 197–208. Springer, Heidelberg (2013)
Buros, W.M., et al.: Understanding systems and architecture for big data. IBM Research Report (2013)
Chen, Y.: We Don’t Know Enough to make a Big Data Benchmark Suite. In: Workshop on Big Data Benchmarking (2012)
Chen, Y., Raab, F., Katz, R.H.: From tpc-c to big data benchmarks: A functional workload model. Technical Report UCB/EECS-2012-174, EECS Department, University of California, Berkeley (July 2012)
Chen, Z., Jianfeng, Z., Zhen, J., Lixin, Z.: Characterizing os behavior of scale-out data center workloads. In: The Seventh Annual Workshop on the Interaction Amongst Virtualization, Operating Systems and Computer Architecture, WIVOSCA 2013 (2013)
Cook, S.A., Reckhow, R.A.: Time bounded random access machines. Journal of Computer and System Sciences 7(4), 354–375 (1973)
Ferdman, M., et al.: Clearing the clouds: A study of emerging workloads on modern hardware. Architectural Support for Programming Languages and Operating Systems (2012)
Gao, W., et al.: A benchmark suite for big data systems. In: The 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) (2013), Tutorial http://prof.ict.ac.cn/HPCA/BigDataBench.pdf
Gao, W., et al.: Bigdatabench: a big data benchmark suite from web search engines. In: The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in Conjunction with the 40th International Symposium on Computer Architecture (May 2013)
Ghazal, A., et al.: Bigbench: Towards an industry standard benchmark for big data analytics. In: ACM SIGMOD Conference (2013)
Holyer, I.: Computational complexity (1984)
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: 2013 IEEE International Symposium on Workload Characterization (IISWC). IEEE (2013)
Jia, Z., Zhan, J., Wang, L., Zhang, L., et al.: Hvcbench: A benchmark suite for data center. The 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) (2013), Tutorial Technical Report http://prof.ict.ac.cn/HPCA/HPCA_Tutorial_HVC_4-jiazhen.pdf
Lotfi-Kamran, P., Grot, B., Ferdman, M., Volos, S., Kocberber, O., Picorel, J., Adileh, A., Jevdjic, D., Idgunji, S., Ozer, E., et al.: Scale-out processors. In: Proceedings of the 39th International Symposium on Computer Architecture, pp. 500–511. IEEE Press (2012)
Luo, C., Zhan, J., Jia, Z., Wang, L., Lu, G., Zhang, L., Xu, C., Sun, N.: Cloudrank-d: benchmarking and ranking cloud computing systems for data processing applications. Frontiers of Computer Science 6(4), 347–362 (2012)
Rajaraman, A.: More data usually beats better algorithms. Datawocky Blog (2008)
Sang, B., Zhan, J., Lu, G., Wang, H., Xu, D., Wang, L., Zhang, Z., Jia, Z.: Precise, scalable, and online request tracing for multitier services of black boxes. IEEE Transactions on Parallel and Distributed Systems 23(6), 1159–1167 (2012)
Skiena, S.S.: The algorithm design manual: with 72 figures, vol. 1. Telos Press (1998)
Wang, L., Zhan, J., Shi, W., Liang, Y.: In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems 23(2), 296–303 (2012)
White, T.: Hadoop: The definitive guide. O’Reilly Media (2012)
Yelick, K.: Single processor machines: Memory hierarchies and processor features
Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM (2010)
Zhan, J., Wang, L., Li, X., Shi, W., Weng, C., Zhang, W., Zang, X.: Cost-aware cooperative resource provisioning for heterogeneous workloads in data centers. IEEE Transactions on Computers
Zhan, J., Zhang, L., Sun, N., Wang, L., Jia, Z., Luo, C.: High volume throughput computing: Identifying and characterizing throughput oriented workloads in data centers. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1712–1721. IEEE (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jia, Z. et al. (2014). The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. In: Rabl, T., Poess, M., Baru, C., Jacobsen, HA. (eds) Specifying Big Data Benchmarks. WBDB WBDB 2012 2012. Lecture Notes in Computer Science, vol 8163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53974-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-53974-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53973-2
Online ISBN: 978-3-642-53974-9
eBook Packages: Computer ScienceComputer Science (R0)