skip to main content
10.1145/3400903.3400932acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
short-paper

Shared Execution Techniques for Business Data Analytics over Big Data Streams

Published:30 July 2020Publication History

ABSTRACT

Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm.

In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.

References

  1. Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman. 2008. Efficient Pattern Matching over Event Streams. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). ACM, New York, NY, USA, 147–160. https://doi.org/10.1145/1376616.1376634Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andreas Behrend, Ulrike Griefahn, Hannes Voigt, and Philip Schmiegelt. 2015. Optimizing Continuous Queries Using Update Propagation with Varying Granularities. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (La Jolla, California) (SSDBM ’15). ACM, New York, NY, USA, Article 14, 12 pages. https://doi.org/10.1145/2791347.2791368Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (SIGMOD ’00). ACM, New York, NY, USA, 379–390. https://doi.org/10.1145/342009.335432Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Zhimin Chen and Vivek Narasayya. 2005. Efficient Computation of Multiple Group by Queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD ’05). ACM, New York, NY, USA, 263–274. https://doi.org/10.1145/1066157.1066188Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rada Chirkova, Chen Li, and Jia Li. 2006. Answering queries using materialized views with minimum size. The VLDB Journal 15, 3 (01 Sep 2006), 191–210. https://doi.org/10.1007/s00778-005-0162-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Françoise Fabret, H. Arno Jacobsen, François Llirbat, Joăo Pereira, Kenneth A. Ross, and Dennis Shasha. 2001. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, California, USA) (SIGMOD ’01). ACM, New York, NY, USA, 115–126. https://doi.org/10.1145/375663.375677Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thanaa M. Ghanem, Ahmed K. Elmagarmid, Per-Åke Larson, and Walid G. Aref. 2008. Supporting Views in Data Stream Management Systems. ACM Trans. Database Syst. 35, 1, Article 1 (Feb. 2008), 47 pages. https://doi.org/10.1145/1670243.1670244Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ashish Gupta and Inderpal Singh Mumick. 1999. Materialized Views. MIT Press, Cambridge, MA, USA, Chapter Maintenance of Materialized Views: Problems, Techniques, and Applications, 145–157. http://dl.acm.org/citation.cfm?id=310709.310737Google ScholarGoogle Scholar
  9. Himanshu Gupta and Inderpal Singh Mumick. 2005. Selection of Views to Materialize in a Data Warehouse. IEEE Trans. on Knowl. and Data Eng. 17, 1 (Jan. 2005), 24–43. https://doi.org/10.1109/TKDE.2005.16Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Moustafa A. Hammad, Michael J. Franklin, Walid G. Aref, and Ahmed K. Elmagarmid. 2003. Scheduling for Shared Window Joins over Data Streams. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB ’03). VLDB Endowment, 297–308. http://dl.acm.org/citation.cfm?id=1315451.1315478Google ScholarGoogle ScholarCross RefCross Ref
  11. Manhui Han, Jonghem Youn, and Sang-goo Lee. 2017. Efficient Query Processing on Distributed Stream Processing Engine. In Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication (Beppu, Japan) (IMCOM ’17). ACM, New York, NY, USA, Article 29, 8 pages. https://doi.org/10.1145/3022227.3022255Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and Alan Demers. 2009. Rule-based Multi-query Optimization. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (Saint Petersburg, Russia) (EDBT ’09). ACM, New York, NY, USA, 120–131. https://doi.org/10.1145/1516360.1516376Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ryan Huebsch, Minos Garofalakis, Joseph M. Hellerstein, and Ion Stoica. 2007. Sharing Aggregate Computation for Distributed Queries. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (Beijing, China) (SIGMOD ’07). ACM, New York, NY, USA, 485–496. https://doi.org/10.1145/1247480.1247535Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Albert Jonathan, Abhishek Chandra, and Jon Weissman. 2018. Multi-Query Optimization in Wide-Area Streaming Analytics. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC ’18). ACM, New York, NY, USA, 412–425. https://doi.org/10.1145/3267809.3267842Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sailesh Krishnamurthy, Michael J. Franklin, Joseph M. Hellerstein, and Garrett Jacobson. 2004. The Case for Precision Sharing. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (Toronto, Canada) (VLDB ’04). VLDB Endowment, 972–984. http://dl.acm.org/citation.cfm?id=1316689.1316773Google ScholarGoogle ScholarCross RefCross Ref
  16. Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly Sharing for Streamed Aggregation. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (Chicago, IL, USA) (SIGMOD ’06). ACM, New York, NY, USA, 623–634. https://doi.org/10.1145/1142473.1142543Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shaosu Liu, Bin Song, Sriharsha Gangam, Lawrence Lo, and Khaled Elmeleegy. 2016. Kodiak: Leveraging Materialized Views for Very Low-latency Analytics over High-dimensional Web-scale Data. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1269–1280. https://doi.org/10.14778/3007263.3007266Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD ’02). ACM, New York, NY, USA, 49–60. https://doi.org/10.1145/564691.564698Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Timos K. Sellis. 1988. Multiple-query Optimization. ACM Trans. Database Syst. 13, 1 (March 1988), 23–52. https://doi.org/10.1145/42201.42203Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yuke Yang, Lukasz Golab, and M. Tamer Özsu. 2017. ViewDF: Declarative Incremental View Maintenance for Streaming Data. Information Systems 71 (07 2017). https://doi.org/10.1016/j.is.2017.07.002Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '20: Proceedings of the 32nd International Conference on Scientific and Statistical Database Management
    July 2020
    241 pages
    ISBN:9781450388146
    DOI:10.1145/3400903

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 30 July 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format