Skip to main content

Experiences and Lessons in Practice Using TPCx-BB Benchmarks

  • Conference paper
  • First Online:
Performance Evaluation and Benchmarking for the Analytics Era (TPCTC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10661))

Included in the following conference series:

Abstract

The TPCx-BigBench (TPCx-BB) is a TPC Express benchmark, which is designed to measure the performance of big data analytics systems. It contains 30 use cases that simulate big data processing, big data storage, big data analytics, and reporting. We have used this benchmark to evaluate the performance of software and hardware components for big data systems. It has very good coverage on different data types and provides enough scalability to address data size and node scaling problems. We have gained lots of meaningful insights through this benchmark to design analytic systems. In the meantime, we also found we cannot merely rely on TPCx-BB to evaluate and design an end-to-end big data systems. There are some gaps between an analytics system and a real end-to-end system. The whole data flow of a real end-to-end system should include data ingestion, which moves data from where it is originated into a system where it can be stored and analyzed such as Hadoop. Data ingestion may be challenging for businesses at a reasonable speed in order to maintain a competitive advantage. However, TPCx-BB cannot help on performance evaluation of software and hardware for data ingestion. Big data is composed of three dimensions: Volume, Variety, and Velocity. The Velocity refers to the high speed in data processing: real-time or near real-time. With big data technology widely used, real-time and near real-time processing become more popular. There is very strict limitation on bandwidth and latency for real-time processing. TPCx-BB cannot help on performance evaluation of software and hardware for real-time processing. This paper mainly discusses these experiences and lessons in practice using TPCx-BB. Then, we provide some advices to extend TPCx-BB to cover data ingestion and real-time processing. We also share some ideas how to implement TPCx-BB coverage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 60.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kafka: https://kafka.apache.org/

  2. Flume: https://flume.apache.org/

  3. Kinesis: https://aws.amazon.com/documentation/kinesis/

  4. Sqoop: https://sqoop.apache.org/

  5. HDFS: https://hadoop.apache.org/

  6. Cassandra: https://cassandra.apache.org/

  7. MongoDB: http://camel.apache.org/

  8. Kudu: https://kudu.apache.org/

  9. MapReduce: https://hadoop.apache.org/

  10. Spark: https://spark.apache.org/

  11. Storm: https://storm.apache.org/

  12. Tez: https://tez.apache.org/

  13. Hive: https://hive.apache.org/

  14. MLlib: https://spark.apache.org/mllib/

  15. Mahout: https://mahout.apache.org/

  16. GraphX: https://spark.apache.org/graphx/

  17. TPCx-BB: http://www.tpc.org/tpcx-bb/default.asp/

  18. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD 2013, 22–27 June 2013, New York, New York, USA (2013)

    Google Scholar 

  19. TPC-DI: http://www.tpc.org/tpcdi/

  20. Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18206-8_4

    Chapter  Google Scholar 

  21. Medvedev, A., Hassani, A., Zaslavsky, A., Jayaraman, P.P., Indrawan-Santiago, M., Delir Haghighi, P., Ling, S.: Data ingestion and storage performance of IoT platforms: study of OpenIoT. In: Podnar Žarko, I., Broering, A., Soursos, S., Serrano, M. (eds.) InterOSS-IoT 2016. LNCS, vol. 10218, pp. 141–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56877-5_9

    Chapter  Google Scholar 

  22. Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M., Liu, Z., Nusbaum, K., Patil, K., Peng, B.J., Poulosky, P.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops

    Google Scholar 

  23. Lavin, A., Ahmad, S.: Evaluating real-time anomaly detection algorithms - the numenta anomaly benchmark. In: 2015 IEEE 14th International Conference Machine Learning and Applications (ICMLA)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kebing Wang , Bianny Bian , Paul Cao or Mike Riess .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, K., Bian, B., Cao, P., Riess, M. (2018). Experiences and Lessons in Practice Using TPCx-BB Benchmarks. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Analytics Era. TPCTC 2017. Lecture Notes in Computer Science(), vol 10661. Springer, Cham. https://doi.org/10.1007/978-3-319-72401-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72401-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72400-3

  • Online ISBN: 978-3-319-72401-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics