skip to main content
10.1145/3209950.3209956acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Adding Velocity to BigBench

Authors Info & Claims
Published:15 June 2018Publication History

ABSTRACT

BigBench standardized as TPCx-BB is a popular application benchmark that targets Big Data storage and processing systems. BigBench V2 addresses some of the BigBench limitations by introducing a new simplified data model, semi-structured web logs in JSON file format and new queries mandating late binding. However, it still covers only batch processing workloads and the Big Data velocity characteristic is not addressed. This work extends the BigBench V2 benchmark with a data streaming component that simulates typical statistical and predictive analytics queries in a retail business scenario. Our approach is to preserve the existing BigBench design and introduce a new streaming component that supports two data streaming modes: active and passive. In active mode, the data stream generation and processing happen in parallel, whereas in passive mode, the data stream is pre-generated in advance before the actual stream processing. The stream workload consists of five queries inspired by the existing 30 BigBench queries. To validate the proposed streaming extension, the two streaming modes were implemented and tested using Kafka and Spark Streaming. The experimental results prove the feasibility of our benchmark design. Finally, we outline design challenges and future plans for improving the proposed BigBench extension.

References

  1. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB 6 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arvind Arasu, Mitch Cherniack, Eduardo F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In the 30th VLDB, Toronto, Canada, Aug. 31-Sept. 3, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lucas Braun, Thomas Etter, Georgios Gasparis, Martin Kaufmann, Donald Kossmann, Daniel Widmer, Aharon Avitzur, Anthony Iliopoulos, Eliezer Levy, and Ning Liang. 2015. Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database. In Proceedings of the SIGMOD 2015, Melbourne, Victoria, Australia, May 31-June 4, 2015. 251--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache Calcite. 2017. https://calcite.apache.org/. (2017).Google ScholarGoogle Scholar
  5. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.Google ScholarGoogle Scholar
  6. Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Peng, and Paul Poulosky. 2016. Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming. In 2016 IPDPS Workshops, Chicago, IL, USA, May 23-27.Google ScholarGoogle Scholar
  7. Apache Drill. 2017. drill.apache.org. (2017).Google ScholarGoogle Scholar
  8. Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In the 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1225--1236.Google ScholarGoogle Scholar
  9. Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 26th IEEE Data Engineering Workshops (ICDEW), 2010. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  11. Todor Ivanov and Max-Georg Beer. 2015. Performance evaluation of spark SQL using BigBench. In Workshop on Big Data Benchmarks. Springer, 96--116.Google ScholarGoogle Scholar
  12. Apache Kafka. 2017. https://kafka.apache.org/. (2017).Google ScholarGoogle Scholar
  13. Andreas Kipf, Varun Pandey, Jan Böttcher, Lucas Braun, Thomas Neumann, and Alfons Kemper. 2017. Analytics on Fast Data: Main-Memory Database Systems vs Modern Streaming Systems. In 20th EDBT 2017, Venice, Italy, March 21-24, 2017.Google ScholarGoogle Scholar
  14. TPCx-BB kit. 2017. https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench. (2017).Google ScholarGoogle Scholar
  15. Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. Spark-Bench: A Comprehensive Benchmarking Suite for In Memory Data Analytic Platform Spark. In 12th ACM International Conference on Computing Frontiers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. 2014. Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. In Proceedings of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2014, London, United Kingdom, December 8-11, 2014. 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shadi A. Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H. Campbell. 2017. Stateful Scalable Stream Processing at LinkedIn. PVLDB 10, 12 (2017), 1634--1645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Milinda Pathirage, Julian Hyde, Yi Pan, and Beth Plale. 2016. SamzaSQL: Scalable Fast Data Management with Streaming SQL. In 2016 IEEE IPDPS Workshops 2016, Chicago, IL, USA, May 23-27, 2016.Google ScholarGoogle Scholar
  19. Anshu Shukla and Yogesh Simmhan. 2016. Benchmarking Distributed Stream Processing Platforms for IoT Applications. In 8th TPCTC 2016, New Delhi, India, Sept. 5-9, 2016. 90--106.Google ScholarGoogle Scholar
  20. Michael Stonebraker, Ugur Çetintemel, and Stanley B. Zdonik. 2005. The 8 requirements of real-time stream processing. SIGMOD Record 34, 4 (2005), 42--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Flink Streaming. 2017. https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/streaming.html. (2017).Google ScholarGoogle Scholar
  22. Spark Streaming. 2017. https://spark.apache.org/streaming/. (2017).Google ScholarGoogle Scholar
  23. Spark Structured Streaming. 2017. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html. (2017).Google ScholarGoogle Scholar
  24. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthikeyan Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy V. Ryaboy. 2014. Storm@twitter. In SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014. 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. TPCx-BB. 2017. www.tpc.org/tpcx-bb/default.asp. (2017).Google ScholarGoogle Scholar
  26. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: fault-tolerant streaming computation at scale. In ACM SIGOPS 24th SOSP '13, Farmington, PA, USA, November 3-6, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adding Velocity to BigBench

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DBTest'18: Proceedings of the Workshop on Testing Database Systems
        June 2018
        49 pages
        ISBN:9781450358262
        DOI:10.1145/3209950

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 June 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate31of56submissions,55%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader