research-article

Adding Velocity to BigBench

Authors:
Todor Ivanov

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany
View Profile

,
Patrick Bedué

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany
View Profile

,
Ahmad Ghazal

Futurewei Technologies Inc., Santa Clara, CA, USA

Futurewei Technologies Inc., Santa Clara, CA, USA
View Profile

,
Roberto V. Zicari

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany

Frankfurt Big Data Lab, Goethe University, Frankfurt am Main, Hessen, Germany
View Profile

DBTest'18: Proceedings of the Workshop on Testing Database SystemsJune 2018Article No.: 6Pages 1–6https://doi.org/10.1145/3209950.3209956

Published:15 June 2018Publication History

DBTest'18: Proceedings of the Workshop on Testing Database Systems

Pages 1–6

ABSTRACT

BigBench standardized as TPCx-BB is a popular application benchmark that targets Big Data storage and processing systems. BigBench V2 addresses some of the BigBench limitations by introducing a new simplified data model, semi-structured web logs in JSON file format and new queries mandating late binding. However, it still covers only batch processing workloads and the Big Data velocity characteristic is not addressed. This work extends the BigBench V2 benchmark with a data streaming component that simulates typical statistical and predictive analytics queries in a retail business scenario. Our approach is to preserve the existing BigBench design and introduce a new streaming component that supports two data streaming modes: active and passive. In active mode, the data stream generation and processing happen in parallel, whereas in passive mode, the data stream is pre-generated in advance before the actual stream processing. The stream workload consists of five queries inspired by the existing 30 BigBench queries. To validate the proposed streaming extension, the two streaming modes were implemented and tested using Kafka and Spark Streaming. The experimental results prove the feasibility of our benchmark design. Finally, we outline design challenges and future plans for improving the proposed BigBench extension.

References

Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB 6 (2013). Google ScholarDigital Library
Arvind Arasu, Mitch Cherniack, Eduardo F. Galvez, David Maier, Anurag Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear Road: A Stream Data Management Benchmark. In the 30th VLDB, Toronto, Canada, Aug. 31-Sept. 3, 2004. Google ScholarDigital Library
Lucas Braun, Thomas Etter, Georgios Gasparis, Martin Kaufmann, Donald Kossmann, Daniel Widmer, Aharon Avitzur, Anthony Iliopoulos, Eliezer Levy, and Ning Liang. 2015. Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database. In Proceedings of the SIGMOD 2015, Melbourne, Victoria, Australia, May 31-June 4, 2015. 251--264. Google ScholarDigital Library
Apache Calcite. 2017. https://calcite.apache.org/. (2017).Google Scholar
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.Google Scholar
Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Peng, and Paul Poulosky. 2016. Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming. In 2016 IPDPS Workshops, Chicago, IL, USA, May 23-27.Google Scholar
Apache Drill. 2017. drill.apache.org. (2017).Google Scholar
Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In the 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017. 1225--1236.Google Scholar
Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197--1208. Google ScholarDigital Library
Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In 26th IEEE Data Engineering Workshops (ICDEW), 2010. IEEE.Google ScholarCross Ref
Todor Ivanov and Max-Georg Beer. 2015. Performance evaluation of spark SQL using BigBench. In Workshop on Big Data Benchmarks. Springer, 96--116.Google Scholar
Apache Kafka. 2017. https://kafka.apache.org/. (2017).Google Scholar
Andreas Kipf, Varun Pandey, Jan Böttcher, Lucas Braun, Thomas Neumann, and Alfons Kemper. 2017. Analytics on Fast Data: Main-Memory Database Systems vs Modern Streaming Systems. In 20th EDBT 2017, Venice, Italy, March 21-24, 2017.Google Scholar
TPCx-BB kit. 2017. https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench. (2017).Google Scholar
Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. Spark-Bench: A Comprehensive Benchmarking Suite for In Memory Data Analytic Platform Spark. In 12th ACM International Conference on Computing Frontiers. Google ScholarDigital Library
Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. 2014. Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. In Proceedings of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2014, London, United Kingdom, December 8-11, 2014. 69--78. Google ScholarDigital Library
Shadi A. Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H. Campbell. 2017. Stateful Scalable Stream Processing at LinkedIn. PVLDB 10, 12 (2017), 1634--1645. Google ScholarDigital Library
Milinda Pathirage, Julian Hyde, Yi Pan, and Beth Plale. 2016. SamzaSQL: Scalable Fast Data Management with Streaming SQL. In 2016 IEEE IPDPS Workshops 2016, Chicago, IL, USA, May 23-27, 2016.Google Scholar
Anshu Shukla and Yogesh Simmhan. 2016. Benchmarking Distributed Stream Processing Platforms for IoT Applications. In 8th TPCTC 2016, New Delhi, India, Sept. 5-9, 2016. 90--106.Google Scholar
Michael Stonebraker, Ugur Çetintemel, and Stanley B. Zdonik. 2005. The 8 requirements of real-time stream processing. SIGMOD Record 34, 4 (2005), 42--47. Google ScholarDigital Library
Flink Streaming. 2017. https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/streaming.html. (2017).Google Scholar
Spark Streaming. 2017. https://spark.apache.org/streaming/. (2017).Google Scholar
Spark Structured Streaming. 2017. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html. (2017).Google Scholar
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthikeyan Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy V. Ryaboy. 2014. Storm@twitter. In SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014. 147--156. Google ScholarDigital Library
TPCx-BB. 2017. www.tpc.org/tpcx-bb/default.asp. (2017).Google Scholar
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: fault-tolerant streaming computation at scale. In ACM SIGOPS 24th SOSP '13, Farmington, PA, USA, November 3-6, 2013. Google ScholarDigital Library

Index Terms

Adding Velocity to BigBench
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Information systems
  1. Data management systems

Recommendations

CoreBigBench: Benchmarking big data core operations
DBTest '20: Proceedings of the workshop on Testing Database Systems

Significant effort was put into big data benchmarking with focus on end-to-end applications. While covering basic functionalities implicitly, the details of the individual contributions to the overall performance are hidden. As a result, end-to-end ...
Read More
BigBench: towards an industry standard benchmark for big data analytics
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

There is a tremendous interest in big data by academia, industry and a large user base. Several commercial and open source providers unleashed a variety of products to support big data storage and processing. As these products mature, there is a need to ...
Read More
ABench: Big Data Architecture Stack Benchmark
ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DBTest'18: Proceedings of the Workshop on Testing Database Systems
June 2018
49 pages
ISBN:9781450358262
DOI:10.1145/3209950

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Big Data Benchmarking
BigBench
Streaming Benchmark
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate31of56submissions,55%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 162
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adding Velocity to BigBench

DBTest'18: Proceedings of the Workshop on Testing Database Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

CoreBigBench: Benchmarking big data core operations

BigBench: towards an industry standard benchmark for big data analytics

ABench: Big Data Architecture Stack Benchmark

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adding Velocity to BigBench

DBTest'18: Proceedings of the Workshop on Testing Database Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

CoreBigBench: Benchmarking big data core operations

BigBench: towards an industry standard benchmark for big data analytics

ABench: Big Data Architecture Stack Benchmark

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media