Abstract
BigBench is the first proposal for an end to end big data analytics benchmark. It features a rich query set with complex, realistic queries. BigBench was developed based on the decision support benchmark TPC-DS. The first proof-of-concept implementation was built for the Teradata Aster parallel database system and the queries were formulated in the proprietary SQL-MR query language. To test other systems, the queries have to be translated.
In this paper, an alternative implementation of BigBench for the Hadoop ecosystem is presented. All 30 queries of BigBench were realized using Apache Hive, Apache Hadoop, Apache Mahout, and NLTK. We will present the different design choices we took and show a proof of concept evaluation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Carey, M.J.: BDMS performance evaluation: practices, pitfalls, and possibilities. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 108–123. Springer, Heidelberg (2013)
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen., H.A.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD Conference (2013)
Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: VLDB, pp. 1138–1149 (2007)
Rabl, T., Ghazal, A., Hu, M., Crolotte, A., Raab, F., Poess, M., Jacobsen, H.-A.: BigBench specification V0.1. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 164–201. Springer, Heidelberg (2014)
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute (2011). http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 26th IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–10 (2010)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a Map-Reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Bird, S., Klein, E., Loper, E., Baldridge, J.: Multidisciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, TeachCL ’08, pp. 62–70 (2008)
Moussa, R.: TPC-H benchmark analytics scenarios and performances on Hadoop data clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)
Kim, K., Jeon, K., Han, H., Kim, S., Jung, H., Yeom, H.: MRBench: a benchmark for MapReduce framework. In: 14th IEEE International Conference on Parallel and Distributed Systems, 2008, ICPADS ’08, December 2008, pp. 11–18 (2008)
Zhao, J.M., Wang, W., Liu, X.: Big data benchmark - Big DS. In: Rabl, T., Raghunath, N., Meikel, P., Milind, B., Jacobsen, H.-A., Chaitanya, B. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 49–57. Springer, Heidelberg (2014)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: ICDEW (2010)
Yi, L., Dai, J.: Experience from hadoop benchmarking with HiBench: from micro-benchmarks toward end-to-end pipelines. In: Rabl, T., Raghunath, N., Meikel, P., Milind, B., Jacobsen, H.-A., Chaitanya, B. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 43–48. Springer, Heidelberg (2014)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD ’09: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178 (2009)
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zhen, C., Lu, G., Zhan, K., Li, X., Qiu, B.: BigDataBench: a big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture. HPCA (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chowdhury, B., Rabl, T., Saadatpanah, P., Du, J., Jacobsen, HA. (2014). A BigBench Implementation in the Hadoop Ecosystem. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-10596-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10595-6
Online ISBN: 978-3-319-10596-3
eBook Packages: Computer ScienceComputer Science (R0)