Abstract
Driven by the increasing industrial data over decades, big data systems have evolved rapidly. The diversity and complexity of industrial applications raise great challenge for companies to choose appropriate big data systems. Therefore, big data system benchmark becomes a research hotspot. Most of the state-of-the-art benchmarks focus on specific domains or data formats.
This paper presents our efforts on multimodel industrial big data benchmark, called MiDBench. MiDBench focuses on big data systems in crane assembly, wind turbines monitoring and simulation results management scenarios, which correspond to bills of materials (a.b.a BoM), time series and unstructured data format respectively. Currently, we have chose and developed eleven typical workloads of these three types application domains in our benchmark suite and we generate synthetic data by scaling the sample data. For the sake of fairness, we chose widely acceptable throughput and response time as metrics. Through the above we have established a set of benchmark applicable to high-end manufacturing with high credibility. Overall, experiment results show that Neo4j (representing graph database) performs better than Oracle (representing relation database) for processing BoM data. IotDB is better than InfluxDB in time series data for query and stress test. MongoDB performs better than ElasticSearch in simulation results management domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Change history
08 October 2019
In the version of this paper that was originally published, reference 3 linked to the wrong website. This has been corrected.
References
Elasticsearch. https://www.elastic.co/
InfluxDB. https://www.influxdata.com/
IoTDB. https://iotdb.apache.org/
MongoDB. https://www.mongodb.com/
MySQL. https://www.mysql.com
Neo4j. https://neo4j.com/
Oracle. https://www.oracle.com
Time series benchmark suite (TSBS). https://github.com/timescale/tsbs
TPC.TPC-A, June 1994. http://www.tpc.org/tpca/spec/tpca_current.pdf
TPC.TPC-C, February 2010. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
TPC.TPC-DS, November 2015. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf
TPC.TPC-E, April 2015. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-e_v1.14.0.pdf
TPC.TPC-H, November 2014. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf
Anderson, T.L., Berre, A.J., Mallison, M., Porter, H.H., Schneider, B.: The HyperModel benchmark. In: Bancilhon, F., Thanos, C., Tsichritzis, D. (eds.) EDBT 1990. LNCS, vol. 416, pp. 317–331. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0022180
Arasu, A., et al.: Linear road: a stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 480–491. VLDB Endowment (2004). http://dl.acm.org/citation.cfm?id=1316689.1316732
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the Facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1185–1196. ACM, New York (2013). https://doi.org/10.1145/2463676.2465296
Böhme, T., Rahm, E.: Multi-user evaluation of XML data management systems with XMach-1. In: Bressan, S., Lee, M.L., Chaudhri, A.B., Yu, J.X., Lacroix, Z. (eds.) Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web. LNCS, vol. 2590, pp. 148–159. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36556-7_12
Jin, C.-Q., Qian, W.-N., Zhou, M.-Q., Zhou, A.-Y.: Benchmarking data management systems: from traditional database to emergent big data. Chin. J. Comput. (2014). http://cjc.ict.ac.cn/online/bfpub/jcq-2014430143239.pdf
Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware, pp. 37–48 (2012). https://www.industry-academia.org/download/ASPLOS12_Clearing_the_Clouds.pdf
Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, pp. 1197–1208. ACM, New York (2013). https://doi.org/10.1145/2463676.2463712
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51, March 2010. https://doi.org/10.1109/ICDEW.2010.5452747
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp. 66–76, September 2013. https://doi.org/10.1109/IISWC.2013.6704671
Li, Y.G., et al.: XOO7: applying OO7 benchmark to xml query processing tool. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM 2001, pp. 167–174. ACM, New York (2001). https://doi.org/10.1145/502585.502614
Ming, Z., et al.: BDGS: a scalable big data generator suite in big data benchmarking. In: Rabl, T., Jacobsen, H.-A., Raghunath, N., Poess, M., Bhandarkar, M., Baru, C. (eds.) WBDB 2013. LNCS, vol. 8585, pp. 138–154. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10596-3_11
Myllymaki, J., Kaufman, J.: DynaMark: a benchmark for dynamic spatial indexing. In: Chen, M.-S., Chrysanthis, P.K., Sloman, M., Zaslavsky, A. (eds.) MDM 2003. LNCS, vol. 2574, pp. 92–105. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36389-0_7
Nicola, M., Kogan, I., Schiefer, B.: An XML transaction processing benchmark. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 937–948. ACM, New York (2007). https://doi.org/10.1145/1247480.1247590
O’Neil, P.E.: The set query benchmark. In: The Benchmark Handbook (1991)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 974–985. VLDB Endowment (2002). http://dl.acm.org/citation.cfm?id=1287369.1287455
Wang, L., et al.: BigDataBench: a big data benchmark suite from internet services. CoRR abs/1401.1406 (2014). http://arxiv.org/abs/1401.1406
Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, pp. 621–632. IEEE Computer Society, Washington, DC (2004). http://dl.acm.org/citation.cfm?id=977401.978145
Zhu, Y., et al.: BigOP: generating comprehensive big data workloads as a benchmarking framework. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014. LNCS, vol. 8422, pp. 483–492. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05813-9_32
Acknowledgment
The work is partially supported by the Ministry of Science and Technology of China, National Key Research and Development Program (No. 2016YFB1000702), and the NSF China under grant No. 61432006. You can visit our MiDBench at https://github.com/dbiir/MiDBench.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Cheng, Y. et al. (2019). MiDBench: Multimodel Industrial Big Data Benchmark. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-32813-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)