Abstract
The expansion of IoT devices and monitoring needs, powered by the capabilities and accessibility of Cloud Computing, has led to an explosion of streaming data and exposed the need for every organization to exploit it. This paper reviews the evolution of Data Stream Management Systems (DSMS) and the convergence into Online Analytical Processing (OLAP) DSMS. The discussion is focused on three current solutions: Scuba, Apache Druid, and Apache Pinot in use in large production environments that satisfy the real-time OLAP on streaming data. Finally, a discussion is presented on a potential evolution of OLAP DSMS and open problems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
The Vertica Analytic Database: C-store 7 Years Later, vol. 5. VLDB Endowment Aug 2012
Abadi, D., et al.: A data stream management system. In: Proceedings ACM SIGMOD. pp. 666–666. ACM (2003)
Abadi, D.J., et al.: The design of the borealis stream processing engine. In: Proceedings of CIDR. pp. 277–289 (2005)
Abraham, L., et al.: Diving into data at facebook. vol. 6, pp. 1057–1067. VLDB Aug 2013
Arasu, A., et al.: Stream: the stanford stream data manager (demonstration description). In: Proceedings ACM SIGMOD. pp. 665–665. ACM (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings ACM PODS. pp. 1–16. ACM (2002)
Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: a foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings ACM SIGMOD. pp. 221–230. ACM (2018)
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink\(^{\rm TM}\): stream and batch processing in a single engine. vol. 38, pp. 28–38 (2015)
Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings ACM SIGMOD. pp. 668–668. ACM (2003)
Coffing, T., Bernier, E.: Actian Matrix (formerly ParAccel) architecture and SQL. Coffing Publ. (2015)
Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., Spatscheck, O.: Gigascope: high performance network monitoring with an sql interface. In: Proceedings ACM SIGMOD. pp. 623–623. ACM (2002)
Garcia-Alvarado, C., Ordonez, C.: Clustering binary cube dimensions to compute relaxed group by aggregations. Inf. Syst. 53(C), 41–59 (2015)
Golab, L., Johnson, T.: Data stream warehousing. In: Proceedings ACM SIGMOD. pp. 949–952. ACM (2013)
Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)
Gopalakrishna, K., et al.: Untangling cluster management with helix. In: Proceedings SoCC. pp. 19:1–19:13. ACM (2012)
Im, J.F., et al.: Pinot: realtime OLAP for 530 million users. In: Proceedings ACM SIGMOD. pp. 583–594. ACM (2018)
Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings ACM IDEAS. pp. 356–361. ACM (2014)
Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. vol. 34, pp. 42–47. ACM (2005)
Yandex: Clickhouse architecture, May 2019. https://clickhouse.yandex
Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings ACM SIGMOD. pp. 157–168. ACM (2014)
Acknowledgements
We thank Jorge Orbay whose comments and feedback helped us improve this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Garcia-Alvarado, C., Kent, J., Liu, L., Hum, J. (2019). Democratization of OLAP DSMS. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-27520-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27519-8
Online ISBN: 978-3-030-27520-4
eBook Packages: Computer ScienceComputer Science (R0)