Experiences and Lessons in Practice Using TPCx-BB Benchmarks

Wang, Kebing; Bian, Bianny; Cao, Paul; Riess, Mike

doi:10.1007/978-3-319-72401-0_7

Kebing Wang¹⁵,
Bianny Bian¹⁵,
Paul Cao¹⁶ &
…
Mike Riess¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10661))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

916 Accesses
1 Citations

Abstract

The TPCx-BigBench (TPCx-BB) is a TPC Express benchmark, which is designed to measure the performance of big data analytics systems. It contains 30 use cases that simulate big data processing, big data storage, big data analytics, and reporting. We have used this benchmark to evaluate the performance of software and hardware components for big data systems. It has very good coverage on different data types and provides enough scalability to address data size and node scaling problems. We have gained lots of meaningful insights through this benchmark to design analytic systems. In the meantime, we also found we cannot merely rely on TPCx-BB to evaluate and design an end-to-end big data systems. There are some gaps between an analytics system and a real end-to-end system. The whole data flow of a real end-to-end system should include data ingestion, which moves data from where it is originated into a system where it can be stored and analyzed such as Hadoop. Data ingestion may be challenging for businesses at a reasonable speed in order to maintain a competitive advantage. However, TPCx-BB cannot help on performance evaluation of software and hardware for data ingestion. Big data is composed of three dimensions: Volume, Variety, and Velocity. The Velocity refers to the high speed in data processing: real-time or near real-time. With big data technology widely used, real-time and near real-time processing become more popular. There is very strict limitation on bandwidth and latency for real-time processing. TPCx-BB cannot help on performance evaluation of software and hardware for real-time processing. This paper mainly discusses these experiences and lessons in practice using TPCx-BB. Then, we provide some advices to extend TPCx-BB to cover data ingestion and real-time processing. We also share some ideas how to implement TPCx-BB coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 60.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kafka: https://kafka.apache.org/
Flume: https://flume.apache.org/
Kinesis: https://aws.amazon.com/documentation/kinesis/
Sqoop: https://sqoop.apache.org/
HDFS: https://hadoop.apache.org/
Cassandra: https://cassandra.apache.org/
MongoDB: http://camel.apache.org/
Kudu: https://kudu.apache.org/
MapReduce: https://hadoop.apache.org/
Spark: https://spark.apache.org/
Storm: https://storm.apache.org/
Tez: https://tez.apache.org/
Hive: https://hive.apache.org/
MLlib: https://spark.apache.org/mllib/
Mahout: https://mahout.apache.org/
GraphX: https://spark.apache.org/graphx/
TPCx-BB: http://www.tpc.org/tpcx-bb/default.asp/
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, 22–27 June 2013, New York, New York, USA (2013)
Google Scholar
TPC-DI: http://www.tpc.org/tpcdi/
Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18206-8_4
Chapter Google Scholar
Medvedev, A., Hassani, A., Zaslavsky, A., Jayaraman, P.P., Indrawan-Santiago, M., Delir Haghighi, P., Ling, S.: Data ingestion and storage performance of IoT platforms: study of OpenIoT. In: Podnar Žarko, I., Broering, A., Soursos, S., Serrano, M. (eds.) InterOSS-IoT 2016. LNCS, vol. 10218, pp. 141–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56877-5_9
Chapter Google Scholar
Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M., Liu, Z., Nusbaum, K., Patil, K., Peng, B.J., Poulosky, P.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops
Google Scholar
Lavin, A., Ahmad, S.: Evaluating real-time anomaly detection algorithms - the numenta anomaly benchmark. In: 2015 IEEE 14th International Conference Machine Learning and Applications (ICMLA)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, Zizhu Science Park, Shanghai, 200241, China
Kebing Wang, Bianny Bian & Mike Riess
Hewlett Packard Enterprise, 11445 Compaq Center W Dr, Houston, TX, 77070, USA
Paul Cao

Authors

Kebing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bianny Bian
View author publications
You can also search for this author in PubMed Google Scholar
Paul Cao
View author publications
You can also search for this author in PubMed Google Scholar
Mike Riess
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kebing Wang , Bianny Bian , Paul Cao or Mike Riess .

Editor information

Editors and Affiliations

Cisco Systems, Inc., San Jose, California, USA
Raghunath Nambiar
Server Technologies, Oracle Corporation, Redwood Shores, California, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K., Bian, B., Cao, P., Riess, M. (2018). Experiences and Lessons in Practice Using TPCx-BB Benchmarks. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-72401-0_7
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72400-3
Online ISBN: 978-3-319-72401-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics