Skip to main content

Democratization of OLAP DSMS

  • Conference paper
  • First Online:
  • 1520 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11708))

Abstract

The expansion of IoT devices and monitoring needs, powered by the capabilities and accessibility of Cloud Computing, has led to an explosion of streaming data and exposed the need for every organization to exploit it. This paper reviews the evolution of Data Stream Management Systems (DSMS) and the convergence into Online Analytical Processing (OLAP) DSMS. The discussion is focused on three current solutions: Scuba, Apache Druid, and Apache Pinot in use in large production environments that satisfy the real-time OLAP on streaming data. Finally, a discussion is presented on a potential evolution of OLAP DSMS and open problems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. The Vertica Analytic Database: C-store 7 Years Later, vol. 5. VLDB Endowment Aug 2012

    Google Scholar 

  2. Abadi, D., et al.: A data stream management system. In: Proceedings ACM SIGMOD. pp. 666–666. ACM (2003)

    Google Scholar 

  3. Abadi, D.J., et al.: The design of the borealis stream processing engine. In: Proceedings of CIDR. pp. 277–289 (2005)

    Google Scholar 

  4. Abraham, L., et al.: Diving into data at facebook. vol. 6, pp. 1057–1067. VLDB Aug 2013

    Google Scholar 

  5. Arasu, A., et al.: Stream: the stanford stream data manager (demonstration description). In: Proceedings ACM SIGMOD. pp. 665–665. ACM (2003)

    Google Scholar 

  6. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings ACM PODS. pp. 1–16. ACM (2002)

    Google Scholar 

  7. Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M.J., Lemire, D.: Apache calcite: a foundational framework for optimized query processing over heterogeneous data sources. In: Proceedings ACM SIGMOD. pp. 221–230. ACM (2018)

    Google Scholar 

  8. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink\(^{\rm TM}\): stream and batch processing in a single engine. vol. 38, pp. 28–38 (2015)

    Google Scholar 

  9. Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: Proceedings ACM SIGMOD. pp. 668–668. ACM (2003)

    Google Scholar 

  10. Coffing, T., Bernier, E.: Actian Matrix (formerly ParAccel) architecture and SQL. Coffing Publ. (2015)

    Google Scholar 

  11. Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., Spatscheck, O.: Gigascope: high performance network monitoring with an sql interface. In: Proceedings ACM SIGMOD. pp. 623–623. ACM (2002)

    Google Scholar 

  12. Garcia-Alvarado, C., Ordonez, C.: Clustering binary cube dimensions to compute relaxed group by aggregations. Inf. Syst. 53(C), 41–59 (2015)

    Article  Google Scholar 

  13. Golab, L., Johnson, T.: Data stream warehousing. In: Proceedings ACM SIGMOD. pp. 949–952. ACM (2013)

    Google Scholar 

  14. Golab, L., Özsu, M.T.: Issues in data stream management. SIGMOD Rec. 32(2), 5–14 (2003)

    Article  Google Scholar 

  15. Gopalakrishna, K., et al.: Untangling cluster management with helix. In: Proceedings SoCC. pp. 19:1–19:13. ACM (2012)

    Google Scholar 

  16. Im, J.F., et al.: Pinot: realtime OLAP for 530 million users. In: Proceedings ACM SIGMOD. pp. 583–594. ACM (2018)

    Google Scholar 

  17. Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings ACM IDEAS. pp. 356–361. ACM (2014)

    Google Scholar 

  18. Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. vol. 34, pp. 42–47. ACM (2005)

    Google Scholar 

  19. Yandex: Clickhouse architecture, May 2019. https://clickhouse.yandex

  20. Yang, F., Tschetter, E., Léauté, X., Ray, N., Merlino, G., Ganguli, D.: Druid: a real-time analytical data store. In: Proceedings ACM SIGMOD. pp. 157–168. ACM (2014)

    Google Scholar 

Download references

Acknowledgements

We thank Jorge Orbay whose comments and feedback helped us improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Garcia-Alvarado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garcia-Alvarado, C., Kent, J., Liu, L., Hum, J. (2019). Democratization of OLAP DSMS. In: Ordonez, C., Song, IY., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2019. Lecture Notes in Computer Science(), vol 11708. Springer, Cham. https://doi.org/10.1007/978-3-030-27520-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27520-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27519-8

  • Online ISBN: 978-3-030-27520-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics