Abstract
The TPCx-BigBench (TPCx-BB) is a TPC Express benchmark, which is designed to measure the performance of big data analytics systems. It contains 30 use cases that simulate big data processing, big data storage, big data analytics, and reporting. We have used this benchmark to evaluate the performance of software and hardware components for big data systems. It has very good coverage on different data types and provides enough scalability to address data size and node scaling problems. We have gained lots of meaningful insights through this benchmark to design analytic systems. In the meantime, we also found we cannot merely rely on TPCx-BB to evaluate and design an end-to-end big data systems. There are some gaps between an analytics system and a real end-to-end system. The whole data flow of a real end-to-end system should include data ingestion, which moves data from where it is originated into a system where it can be stored and analyzed such as Hadoop. Data ingestion may be challenging for businesses at a reasonable speed in order to maintain a competitive advantage. However, TPCx-BB cannot help on performance evaluation of software and hardware for data ingestion. Big data is composed of three dimensions: Volume, Variety, and Velocity. The Velocity refers to the high speed in data processing: real-time or near real-time. With big data technology widely used, real-time and near real-time processing become more popular. There is very strict limitation on bandwidth and latency for real-time processing. TPCx-BB cannot help on performance evaluation of software and hardware for real-time processing. This paper mainly discusses these experiences and lessons in practice using TPCx-BB. Then, we provide some advices to extend TPCx-BB to cover data ingestion and real-time processing. We also share some ideas how to implement TPCx-BB coverage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kafka: https://kafka.apache.org/
Flume: https://flume.apache.org/
Sqoop: https://sqoop.apache.org/
Cassandra: https://cassandra.apache.org/
MongoDB: http://camel.apache.org/
Kudu: https://kudu.apache.org/
MapReduce: https://hadoop.apache.org/
Spark: https://spark.apache.org/
Storm: https://storm.apache.org/
Hive: https://hive.apache.org/
Mahout: https://mahout.apache.org/
GraphX: https://spark.apache.org/graphx/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, 22–27 June 2013, New York, New York, USA (2013)
TPC-DI: http://www.tpc.org/tpcdi/
Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18206-8_4
Medvedev, A., Hassani, A., Zaslavsky, A., Jayaraman, P.P., Indrawan-Santiago, M., Delir Haghighi, P., Ling, S.: Data ingestion and storage performance of IoT platforms: study of OpenIoT. In: Podnar Žarko, I., Broering, A., Soursos, S., Serrano, M. (eds.) InterOSS-IoT 2016. LNCS, vol. 10218, pp. 141–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56877-5_9
Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M., Liu, Z., Nusbaum, K., Patil, K., Peng, B.J., Poulosky, P.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops
Lavin, A., Ahmad, S.: Evaluating real-time anomaly detection algorithms - the numenta anomaly benchmark. In: 2015 IEEE 14th International Conference Machine Learning and Applications (ICMLA)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Wang, K., Bian, B., Cao, P., Riess, M. (2018). Experiences and Lessons in Practice Using TPCx-BB Benchmarks. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-72401-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72400-3
Online ISBN: 978-3-319-72401-0
eBook Packages: Computer ScienceComputer Science (R0)