A BigBench Implementation in the Hadoop Ecosystem

Chowdhury, Badrul; Rabl, Tilmann; Saadatpanah, Pooya; Du, Jiang; Jacobsen, Hans-Arno

doi:10.1007/978-3-319-10596-3_1

A BigBench Implementation in the Hadoop Ecosystem

Badrul Chowdhury¹⁹,
Tilmann Rabl¹⁹,
Pooya Saadatpanah²⁰,
Jiang Du²⁰ &
…
Hans-Arno Jacobsen¹⁹

Conference paper
First Online: 01 January 2014

1408 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Abstract

BigBench is the first proposal for an end to end big data analytics benchmark. It features a rich query set with complex, realistic queries. BigBench was developed based on the decision support benchmark TPC-DS. The first proof-of-concept implementation was built for the Teradata Aster parallel database system and the queries were formulated in the proprietary SQL-MR query language. To test other systems, the queries have to be translated.

In this paper, an alternative implementation of BigBench for the Hadoop ecosystem is presented. All 30 queries of BigBench were realized using Apache Hive, Apache Hadoop, Apache Mahout, and NLTK. We will present the different design choices we took and show a proof of concept evaluation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Carey, M.J.: BDMS performance evaluation: practices, pitfalls, and possibilities. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 108–123. Springer, Heidelberg (2013)
Chapter Google Scholar
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen., H.A.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD Conference (2013)
Google Scholar
Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: VLDB, pp. 1138–1149 (2007)
Google Scholar
Rabl, T., Ghazal, A., Hu, M., Crolotte, A., Raab, F., Poess, M., Jacobsen, H.-A.: BigBench specification V0.1. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 164–201. Springer, Heidelberg (2014)
Chapter Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute (2011). http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 26th IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–10 (2010)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a Map-Reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Article Google Scholar
Bird, S., Klein, E., Loper, E., Baldridge, J.: Multidisciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, TeachCL ’08, pp. 62–70 (2008)
Google Scholar
Moussa, R.: TPC-H benchmark analytics scenarios and performances on Hadoop data clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
Chapter Google Scholar
Kim, K., Jeon, K., Han, H., Kim, S., Jung, H., Yeom, H.: MRBench: a benchmark for MapReduce framework. In: 14th IEEE International Conference on Parallel and Distributed Systems, 2008, ICPADS ’08, December 2008, pp. 11–18 (2008)
Google Scholar
Zhao, J.M., Wang, W., Liu, X.: Big data benchmark - Big DS. In: Rabl, T., Raghunath, N., Meikel, P., Milind, B., Jacobsen, H.-A., Chaitanya, B. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 49–57. Springer, Heidelberg (2014)
Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: ICDEW (2010)
Google Scholar
Yi, L., Dai, J.: Experience from hadoop benchmarking with HiBench: from micro-benchmarks toward end-to-end pipelines. In: Rabl, T., Raghunath, N., Meikel, P., Milind, B., Jacobsen, H.-A., Chaitanya, B. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 43–48. Springer, Heidelberg (2014)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD ’09: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178 (2009)
Google Scholar
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zhen, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture. HPCA (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Middleware Systems Research Group, University of Toronto, Toronto, Canada
Badrul Chowdhury, Tilmann Rabl & Hans-Arno Jacobsen
Database Research Group, University of Toronto, Toronto, Canada
Pooya Saadatpanah & Jiang Du

Authors

Badrul Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Tilmann Rabl
View author publications
You can also search for this author in PubMed Google Scholar
Pooya Saadatpanah
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Arno Jacobsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tilmann Rabl .

Editor information

Editors and Affiliations

University of Toronto, Toronto, Ontario, Canada
Tilmann Rabl
Cisco Systems, Inc., San José, USA
Nambiar Raghunath
Oracle Corporation, Redwood Shores, USA
Meikel Poess
Pivotal Software, Inc., Palo Alto, USA
Milind Bhandarkar
University of Toronto, Toronto, Canada
Hans-Arno Jacobsen
University of California at San Diego, La Jolla, USA
Chaitanya Baru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, B., Rabl, T., Saadatpanah, P., Du, J., Jacobsen, HA. (2014). A BigBench Implementation in the Hadoop Ecosystem. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-10596-3_1
Published: 09 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10595-6
Online ISBN: 978-3-319-10596-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics