ABSTRACT
Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm.
In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.
- Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman. 2008. Efficient Pattern Matching over Event Streams. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). ACM, New York, NY, USA, 147–160. https://doi.org/10.1145/1376616.1376634Google ScholarDigital Library
- Andreas Behrend, Ulrike Griefahn, Hannes Voigt, and Philip Schmiegelt. 2015. Optimizing Continuous Queries Using Update Propagation with Varying Granularities. In Proceedings of the 27th International Conference on Scientific and Statistical Database Management (La Jolla, California) (SSDBM ’15). ACM, New York, NY, USA, Article 14, 12 pages. https://doi.org/10.1145/2791347.2791368Google ScholarDigital Library
- Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. 2000. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (SIGMOD ’00). ACM, New York, NY, USA, 379–390. https://doi.org/10.1145/342009.335432Google ScholarDigital Library
- Zhimin Chen and Vivek Narasayya. 2005. Efficient Computation of Multiple Group by Queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Baltimore, Maryland) (SIGMOD ’05). ACM, New York, NY, USA, 263–274. https://doi.org/10.1145/1066157.1066188Google ScholarDigital Library
- Rada Chirkova, Chen Li, and Jia Li. 2006. Answering queries using materialized views with minimum size. The VLDB Journal 15, 3 (01 Sep 2006), 191–210. https://doi.org/10.1007/s00778-005-0162-8Google ScholarDigital Library
- Françoise Fabret, H. Arno Jacobsen, François Llirbat, Joăo Pereira, Kenneth A. Ross, and Dennis Shasha. 2001. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe Systems. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, California, USA) (SIGMOD ’01). ACM, New York, NY, USA, 115–126. https://doi.org/10.1145/375663.375677Google ScholarDigital Library
- Thanaa M. Ghanem, Ahmed K. Elmagarmid, Per-Åke Larson, and Walid G. Aref. 2008. Supporting Views in Data Stream Management Systems. ACM Trans. Database Syst. 35, 1, Article 1 (Feb. 2008), 47 pages. https://doi.org/10.1145/1670243.1670244Google ScholarDigital Library
- Ashish Gupta and Inderpal Singh Mumick. 1999. Materialized Views. MIT Press, Cambridge, MA, USA, Chapter Maintenance of Materialized Views: Problems, Techniques, and Applications, 145–157. http://dl.acm.org/citation.cfm?id=310709.310737Google Scholar
- Himanshu Gupta and Inderpal Singh Mumick. 2005. Selection of Views to Materialize in a Data Warehouse. IEEE Trans. on Knowl. and Data Eng. 17, 1 (Jan. 2005), 24–43. https://doi.org/10.1109/TKDE.2005.16Google ScholarDigital Library
- Moustafa A. Hammad, Michael J. Franklin, Walid G. Aref, and Ahmed K. Elmagarmid. 2003. Scheduling for Shared Window Joins over Data Streams. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB ’03). VLDB Endowment, 297–308. http://dl.acm.org/citation.cfm?id=1315451.1315478Google ScholarCross Ref
- Manhui Han, Jonghem Youn, and Sang-goo Lee. 2017. Efficient Query Processing on Distributed Stream Processing Engine. In Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication (Beppu, Japan) (IMCOM ’17). ACM, New York, NY, USA, Article 29, 8 pages. https://doi.org/10.1145/3022227.3022255Google ScholarDigital Library
- Mingsheng Hong, Mirek Riedewald, Christoph Koch, Johannes Gehrke, and Alan Demers. 2009. Rule-based Multi-query Optimization. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (Saint Petersburg, Russia) (EDBT ’09). ACM, New York, NY, USA, 120–131. https://doi.org/10.1145/1516360.1516376Google ScholarDigital Library
- Ryan Huebsch, Minos Garofalakis, Joseph M. Hellerstein, and Ion Stoica. 2007. Sharing Aggregate Computation for Distributed Queries. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (Beijing, China) (SIGMOD ’07). ACM, New York, NY, USA, 485–496. https://doi.org/10.1145/1247480.1247535Google ScholarDigital Library
- Albert Jonathan, Abhishek Chandra, and Jon Weissman. 2018. Multi-Query Optimization in Wide-Area Streaming Analytics. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC ’18). ACM, New York, NY, USA, 412–425. https://doi.org/10.1145/3267809.3267842Google ScholarDigital Library
- Sailesh Krishnamurthy, Michael J. Franklin, Joseph M. Hellerstein, and Garrett Jacobson. 2004. The Case for Precision Sharing. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (Toronto, Canada) (VLDB ’04). VLDB Endowment, 972–984. http://dl.acm.org/citation.cfm?id=1316689.1316773Google ScholarCross Ref
- Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly Sharing for Streamed Aggregation. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (Chicago, IL, USA) (SIGMOD ’06). ACM, New York, NY, USA, 623–634. https://doi.org/10.1145/1142473.1142543Google ScholarDigital Library
- Shaosu Liu, Bin Song, Sriharsha Gangam, Lawrence Lo, and Khaled Elmeleegy. 2016. Kodiak: Leveraging Materialized Views for Very Low-latency Analytics over High-dimensional Web-scale Data. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1269–1280. https://doi.org/10.14778/3007263.3007266Google ScholarDigital Library
- Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD ’02). ACM, New York, NY, USA, 49–60. https://doi.org/10.1145/564691.564698Google ScholarDigital Library
- Timos K. Sellis. 1988. Multiple-query Optimization. ACM Trans. Database Syst. 13, 1 (March 1988), 23–52. https://doi.org/10.1145/42201.42203Google ScholarDigital Library
- Yuke Yang, Lukasz Golab, and M. Tamer Özsu. 2017. ViewDF: Declarative Incremental View Maintenance for Streaming Data. Information Systems 71 (07 2017). https://doi.org/10.1016/j.is.2017.07.002Google ScholarDigital Library
Recommendations
Query Processing Techniques for Big Spatial-Keyword Data
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataThe widespread use of GPS-enabled cellular devices, i.e., smart phones, led to the popularity of numerous mobile applications, e.g., social networks, micro-blogs, mobile web search, and crowd-powered reviews. These applications generate large amounts of ...
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data ResearchThis paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Comments