Abstract
Benchmarking Big Data systems is an open challenge. The existing Micro-Benchmarks (e.g. TeraSort) do not present an end-to-end scenario in real world. To solve this issue, a new towards industry standard benchmark for Big Data Analytics called BigBench has been proposed. And with BigBench, we’ve been keeping our collaboration with Apache Open Source Community to work on performance tuning and optimization for Hadoop ecosystem. In this paper, we share our contributions to BigBench, and present our tuning and optimization experience along with the benchmark results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
TPCx-BB. http://www.tpc.org/tpcx-bb/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD (2013)
Hive on Spark. https://issues.apache.org/jira/browse/HIVE-7292
TPC-H. http://www.tpc.org/tpch/
TPC-DS. http://www.tpc.org/tpcds/
Tuning Spark. http://spark.apache.org/docs/latest/tuning.html
Chiba, T., Onodera, T.: Workload characterization and optimization of TPC-H queries on Apache Spark. Computer Science (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Google Inc (2004)
Rabl, T., Frank, M., Danisch, M., Gowda, B., Jacobsen, H.-A.: Towards a complete BigBench implementation. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, H.-A. (eds.) WBDB 2015. LNCS, vol. 8991, pp. 3–11. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20233-4_1
Baru, C., Bhandarkar, M., Curino, C., Danisch, M., Frank, M., Gowda, B., Jacobsen, H., Jie, H., Kumar, D., Nambiar, R., Poess, M., Raab, F., Rabl, T., Ravi, N., Sachs, K., Sen, S., Yi, L., Youn, C.: Discussion of BigBench: a proposed industry standard performance benchmark for big data
Spark Dynamic Allocation. http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Blanas, S., Patel, M., Ercegovac, V., Rao, J., Shekita, J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986
Spark Memory Management. http://spark.apache.org/docs/latest/configuration.html#memory-management
Ganelin, I., Orhian, E., Sasaki, K., York, B.: SparkTM: Big Data Cluster Computing in Production. Wiley, New York (2016). Chap. 2
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Tang, Y. et al. (2016). Accelerating BigBench on Hadoop. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-49748-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)