Accelerating BigBench on Hadoop

Tang, Yan; Bhaskar, Gowda; Chen, Jack; Hao, Xin; Zhou, Yi; Yao, Yi; Wang, Lifeng

doi:10.1007/978-3-319-49748-8_7

Yan Tang¹⁹,
Gowda Bhaskar²⁰,
Jack Chen¹⁹,
Xin Hao¹⁹,
Yi Zhou¹⁹,
Yi Yao¹⁹ &
…
Lifeng Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10044))

Included in the following conference series:

879 Accesses

Abstract

Benchmarking Big Data systems is an open challenge. The existing Micro-Benchmarks (e.g. TeraSort) do not present an end-to-end scenario in real world. To solve this issue, a new towards industry standard benchmark for Big Data Analytics called BigBench has been proposed. And with BigBench, we’ve been keeping our collaboration with Apache Open Source Community to work on performance tuning and optimization for Hadoop ecosystem. In this paper, we share our contributions to BigBench, and present our tuning and optimization experience along with the benchmark results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards a Complete BigBench Implementation

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

Experiences and Lessons in Practice Using TPCx-BB Benchmarks

References

TPCx-BB. http://www.tpc.org/tpcx-bb/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD (2013)
Google Scholar
Hive on Spark. https://issues.apache.org/jira/browse/HIVE-7292
TPC-H. http://www.tpc.org/tpch/
TPC-DS. http://www.tpc.org/tpcds/
Tuning Spark. http://spark.apache.org/docs/latest/tuning.html
Chiba, T., Onodera, T.: Workload characterization and optimization of TPC-H queries on Apache Spark. Computer Science (2015)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Google Inc (2004)
Google Scholar
Rabl, T., Frank, M., Danisch, M., Gowda, B., Jacobsen, H.-A.: Towards a complete BigBench implementation. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, H.-A. (eds.) WBDB 2015. LNCS, vol. 8991, pp. 3–11. Springer, Heidelberg (2015). doi:10.1007/978-3-319-20233-4_1
Chapter Google Scholar
Baru, C., Bhandarkar, M., Curino, C., Danisch, M., Frank, M., Gowda, B., Jacobsen, H., Jie, H., Kumar, D., Nambiar, R., Poess, M., Raab, F., Rabl, T., Ravi, N., Sachs, K., Sen, S., Yi, L., Youn, C.: Discussion of BigBench: a proposed industry standard performance benchmark for big data
Google Scholar
Spark Dynamic Allocation. http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Blanas, S., Patel, M., Ercegovac, V., Rao, J., Shekita, J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986
Google Scholar
Spark Memory Management. http://spark.apache.org/docs/latest/configuration.html#memory-management
Ganelin, I., Orhian, E., Sasaki, K., York, B.: Spark^TM: Big Data Cluster Computing in Production. Wiley, New York (2016). Chap. 2
Book Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Shanghai, China
Yan Tang, Jack Chen, Xin Hao, Yi Zhou, Yi Yao & Lifeng Wang
Intel Corporation, Santa Clara, California, USA
Gowda Bhaskar

Authors

Yan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Gowda Bhaskar
View author publications
You can also search for this author in PubMed Google Scholar
Jack Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yao
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yan Tang , Gowda Bhaskar , Jack Chen , Xin Hao , Yi Zhou , Yi Yao or Lifeng Wang .

Editor information

Editors and Affiliations

Technical University of Berlin, Berlin, Germany
Tilmann Rabl
Cisco Systems, Inc., San Jose, California, USA
Raghunath Nambiar
University of California at San Diego, La Jolla, California, USA
Chaitanya Baru
Ampool, Inc., Santa Clara, California, USA
Milind Bhandarkar
Oracle Corporation, Redwood Shores, California, USA
Meikel Poess
Indian Institute of Public Health, Hyderabad, India
Saumyadipta Pyne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, Y. et al. (2016). Accelerating BigBench on Hadoop. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-49748-8_7
Published: 01 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49747-1
Online ISBN: 978-3-319-49748-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics