research-article

A Step Toward Deep Online Aggregation

Authors:
Nikhil Sheoran

Databricks & University of Illinois at Urbana-Champaign, Mountain View, CA, USA

Databricks & University of Illinois at Urbana-Champaign, Mountain View, CA, USA

0000-0003-4533-826X
View Profile

,
Supawit Chockchowwat

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA

0000-0003-2881-8501
View Profile

,
Arav Chheda

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA

0009-0006-0882-2983
View Profile

,
Suwen Wang

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA

0009-0000-1385-8426
View Profile

,
Riya Verma

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA

0009-0005-6773-5436
View Profile

,
Yongjoo Park

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA

0000-0003-3786-6214
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 2Article No.: 124pp 1–28https://doi.org/10.1145/3589269

Published:20 June 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

For exploratory data analysis, it is often desirable to know what answers you are likely to get before actually obtaining those answers. This can potentially be achieved by designing systems to offer the estimates of a data operation result-say op(data)-earlier in the process based on partial data processing. Those estimates continuously refine as more data is processed and finally converge to the exact answer. Unfortunately, the existing techniques-called Online Aggregation (OLA)-are limited to a single operation; that is, we cannot obtain the estimates for op(op(data)) or op(...(op(data))). If this Deep OLA becomes possible, data analysts will be able to explore data more interactively using complex cascade operations.

In this work, we take a step toward Deep OLA with evolving data frames (edf), a novel data model to offer OLA for nested ops-op(...(op(data)))-by representing an evolving structured data (with converging estimates) that is closed under set operations. That is, op(edf) produces yet another edf; thus, we can freely apply successive operations to edf and obtain an OLA output for each op. We evaluate its viability with Wake, an edf-based OLA system, by examining against state-of-the-art OLA and non-OLA systems. In our experiments on TPC-H dataset, Wake produces its first estimates 4.93× faster (median)-with 1.3× median slowdown for exact answers-compared to conventional systems. Besides its generality, Wake is also 1.92× faster (median) than existing OLA systems in producing estimates of under 1% relative errors.

Supplemental Material

PACMMOD-V1mod124.mp4

Presentation video for SIGMOD 2023

mp4

232.6 MB

Download

References

Accessed: 2022--10-01. MySQL 8.0 Reference - FIND_IN_SET. https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set.Google Scholar
Accessed: 2022--10-01. MySQL 8.0 Reference - GROUP_CONCAT. https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_group-concat.Google Scholar
Accessed: 2022--10--15. Apache Arrow. https://arrow.apache.org/.Google Scholar
Accessed: 2022--10--15. Apache Parquet. https://parquet.apache.org/.Google Scholar
Accessed: 2022--10--15. ProgressiveDB. https://github.com/DataManagementLab/progressiveDB.Google Scholar
Accessed: 2022--10--15. TPC-H: Decision Support Benchmark. https://www.tpc.org/tpch/.Google Scholar
Accessed: 2022--10--15. XDB: approXimate DataBase (XDB). https://github.com/InitialDLab/XDB.Google Scholar
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. 29--42.Google ScholarDigital Library
Divyakant Agrawal, Amr El Abbadi, Ambuj Singh, and Tolga Yurek. 1997. Efficient view maintenance at data warehouses. ACM SIGMOD Record 26, 2 (1997), 417--427.Google ScholarDigital Library
Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. Proceedings of the VLDB Endowment 2, 2 (2009), 1566--1569.Google ScholarDigital Library
Brian Babcock, Surajit Chaudhuri, and Gautam Das. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 539--550.Google ScholarDigital Library
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. SAQE: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment 13, 12 (2020), 2691--2705.Google ScholarDigital Library
Lukas Berg, Tobias Ziegler, Carsten Binnig, and Uwe Röhm. 2019. ProgressiveDB: progressive data analytics as a middleware. Proceedings of the VLDB Endowment 12, 12 (2019), 1814--1817.Google ScholarDigital Library
Jose A Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently updating materialized views. ACM SIGMOD Record 15, 2 (1986), 61--71.Google ScholarDigital Library
Leonid V. Bogachev, Alexander V. Gnedin, and Yuri V. Yakubovich. 2008. On the variance of the number of occupied boxes. Advances in Applied Mathematics 40, 4 (2008), 401--432. https://doi.org/10.1016/j.aam.2007.05.002Google ScholarDigital Library
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2 (2001), 199--223.Google ScholarDigital Library
Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.Google ScholarDigital Library
Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data. 511--519.Google ScholarDigital Library
Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate Query Processing: No Silver Bullet. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 511--519. https://doi.org/10.1145/3035918.3056097Google ScholarDigital Library
Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. 1995. Optimizing queries with materialized views. In Proceedings of the Eleventh International Conference on Data Engineering. IEEE, 190--200.Google ScholarCross Ref
Shimin Chen, Phillip B Gibbons, and Suman Nath. 2010. Pr-join: a non-blocking join achieving higher early result rate with statistical guarantees. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 147--158.Google ScholarDigital Library
Supawit Chockchowwat, Wenjie Liu, and Yongjoo Park. 2022. Automatically Finding Optimal Index Structure. arXiv preprint arXiv:2208.03823 (2022).Google Scholar
Supawit Chockchowwat, Chaitanya Sood, and Yongjoo Park. 2022. Airphant: Cloud-oriented Document Indexing. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1368--1381.Google Scholar
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, and Russell Sears. 2010. Online aggregation and continuous query support in mapreduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 1115--1118.Google ScholarDigital Library
Graham Cormode. 2011. Sketch techniques for approximate query processing. Foundations and Trends in Databases. NOW publishers (2011), 15.Google Scholar
Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4, 1--3 (2011), 1--294.Google Scholar
Andrew Crotty, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2016. The case for interactive data exploration accelerators (IDEAs). In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1--6.Google ScholarDigital Library
Herbert A David and Haikady N Nagaraja. 2004. Order statistics. John Wiley & Sons.Google Scholar
Jens-Peter Dittrich, Bernhard Seeger, David Scot Taylor, and Peter Widmayer. 2002. Progressive merge join: a generic and non-blocking sort-based join algorithm. In Proceedings of the 28th international conference on Very Large Data Bases. 299--310.Google Scholar
Bradley Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1 (1979), 1 -- 26. https://doi.org/10.1214/aos/1176344552Google ScholarCross Ref
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science, 137--156.Google Scholar
Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing queries using materialized views: a practical, scalable solution. ACM SIGMOD Record 30, 2 (2001), 331--342.Google ScholarDigital Library
Ashish Gupta, Inderpal Singh Mumick, and Venkatramanan Siva Subrahmanian. 1993. Maintaining views incrementally. ACM SIGMOD Record 22, 2 (1993), 157--166.Google ScholarDigital Library
Peter J Haas and Joseph M Hellerstein. 1999. Ripple joins for online aggregation. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 287--298.Google ScholarDigital Library
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, and Lynne Stokes. 1995. Sampling-Based Estimation of the Number of Distinct Values of an Attribute. In VLDB. Morgan Kaufmann, 311--322.Google Scholar
Fumio Hayashi. 2000. Econometrics. Princeton University Press. 27--32 pages.Google Scholar
Wen He, Yongjoo Park, Idris Hanafi, Jacob Yatvitskiy, and Barzan Mozafari. 2018. Demonstration of VerdictDB, the platform-independent AQP system. In Proceedings of the 2018 International Conference on Management of Data. 1665--1668.Google ScholarDigital Library
Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarDigital Library
Ian Hellström. Accessed: 2022--10-01. Oracle SQL & PL/SQL Optimization for Developers. https://oracle.readthedocs.io/en/latest/sql/joins/hash-join.html.Google Scholar
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607 (2019).Google Scholar
Yannis E Ioannidis and Viswanath Poosala. 1999. Histogram-based approximation of set-valued query-answers. In VLDB, Vol. 99. 174--185.Google Scholar
Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the dbo engine. ACM Transactions on Database Systems (TODS) 33, 4 (2008), 1--54.Google ScholarDigital Library
Christopher Jermaine, Alin Dobra, Subramanian Arumugam, Shantanu Joshi, and Abhijit Pol. 2006. The sort-merge-shrink join. ACM Transactions on Database Systems (TODS) 31, 4 (2006), 1382--1416.Google ScholarDigital Library
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. QuickR: Lazily approximating complex adhoc queries in bigdata clusters. In Proceedings of the 2016 international conference on management of data. 631--646.Google ScholarDigital Library
Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, and Ronitt Rubinfeld. 2015. Rapid sampling for visualizations with ordering guarantees. In Proceedings of the vldb endowment international conference on very large data bases, Vol. 8. NIH Public Access, 521.Google ScholarDigital Library
Samuel Kotz and Saralees Nadarajah. 2000. Extreme Value Distributions. PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO. https://doi.org/10.1142/p191 arXiv:https://www.worldscientific.com/doi/pdf/10.1142/p191Google ScholarCross Ref
Harry H. Ku. 2010. Notes on the Use of Propagation of Error Formulas. In Journal of Research of the National Bureau of Standards, Section C: Engineering and Instrumentation, Vol. 2.Google Scholar
Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. arXiv preprint arXiv:1208.4173 (2012).Google Scholar
Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data. 615--629.Google ScholarDigital Library
Zhaoheng Li, Xinyu Pi, and Yongjoo Park. 2023. S/C: Speeding up Data Materialization with Bounded Memory. In 2023 IEEE 39th international conference on data engineering (ICDE). IEEE.Google ScholarCross Ref
Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment 14, 11 (2021), 1950--1963.Google ScholarDigital Library
Gang Luo, Curt J Ellmann, Peter J Haas, and Jeffrey F Naughton. 2002. A scalable hash ripple join algorithm. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 252--262.Google ScholarDigital Library
Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.Google ScholarDigital Library
Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing 14, 9 (2011), 1--9.Google Scholar
Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow.. In CIDR.Google Scholar
John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, et al . 2015. S-Store: Streaming Meets Transaction Processing. Proceedings of the VLDB Endowment 8, 13 (2015).Google Scholar
Mohamed F Mokbel, Ming Lu, and Walid G Aref. 2004. Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings. 20th International Conference on Data Engineering. IEEE, 251--262.Google ScholarCross Ref
Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.Google ScholarDigital Library
Milos Nikolic, Mohammed Elseidy, and Christoph Koch. 2014. LINVIEW: incremental view maintenance for complex analytical queries. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 253--264.Google ScholarDigital Library
Niketan Pansare, Vinayak Borkar, Chris Jermaine, and Tyson Condie. 2011. Online aggregation for large mapreduce jobs. Proceedings of the VLDB Endowment 4, 11 (2011), 1135--1145.Google ScholarDigital Library
Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data synthesis based on generative adversarial networks. arXiv preprint arXiv:1806.03384 (2018).Google Scholar
Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2015. Neighbor-Sensitive Hashing. Proceedings of the VLDB Endowment 9, 3 (2015), 144--155.Google ScholarDigital Library
Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2016. Visualization-aware sampling for very large databases. In 2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, 755--766.Google ScholarCross Ref
Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. VerdictDB: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. 1461--1476.Google ScholarDigital Library
Yongjoo Park, Jingyi Qing, Xiaoyang Shen, and Barzan Mozafari. 2019. BlinkML: Efficient maximum likelihood estimation with probabilistic guarantees. In Proceedings of the 2019 International Conference on Management of Data. 1135--1152.Google ScholarDigital Library
Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, and Barzan Mozafari. 2017. Database learning: Toward a database that becomes smarter every time. In Proceedings of the 2017 ACM International Conference on Management of Data. 587--602.Google ScholarDigital Library
Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2020. QuickSel: Quick selectivity learning with mixture models. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1017--1033.Google ScholarDigital Library
Emanuel Parzen. 1962. On estimation of a probability density function and mode. The annals of mathematical statistics 33, 3 (1962), 1065--1076.Google Scholar
pola rs. Accessed: 2022--10--14. Polars: Lightning-fast DataFrame library for Rust and Python. https://www.pola.rs/.Google Scholar
Georg Pólya. 1920. Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem. Mathematische Zeitschrift 8 (1920), 171--181.Google ScholarCross Ref
Viswanath Poosala, Venkatesh Ganti, and Yannis E. Ioannidis. 1999. Approximate query answering using histograms. IEEE Data Eng. Bull. 22, 4 (1999), 5--14.Google Scholar
Murray Rosenblatt. 1956. Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics (1956), 832--837.Google Scholar
Kenneth Salem, Kevin Beyer, Bruce Lindsay, and Roberta Cochrane. 2000. How to roll a join: Asynchronous incremental view maintenance. ACM SIGMOD Record 29, 2 (2000), 129--140.Google ScholarDigital Library
Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813. https://doi.org/10.1109/ICDE.2019.00196Google ScholarCross Ref
Nikhil Sheoran, Supawit Chockchowwat, Arav Chheda, Suwen Wang, Riya Verma, and Yongjoo Park. 2022. A Step Toward Deep Online Aggregation (Extended Version). arXiv preprint arXiv:2303.04103 (2022).Google Scholar
Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, and Vikas Maddukuri. 2022. Conditional Generative Model Based Predicate-Aware Query Approximation. Proceedings of the AAAI Conference on Artificial Intelligence 36, 8 (Jun. 2022), 8259--8266. https://doi.org/10.1609/aaai.v36i8.20800Google ScholarCross Ref
Yingjie Shi, Xiaofeng Meng, Fusheng Wang, and Yantao Gan. 2012. You Can Stop Early with COLA: Online Processing of Aggregate Queries in the Cloud. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (Maui, Hawaii, USA) (CIKM '12). Association for Computing Machinery, New York, NY, USA, 1223--1232. https://doi.org/10.1145/2396761.2398423Google ScholarDigital Library
P. Tchébychef. 1867. Des valeurs moyennes (Traduction du russe, N. de Khanikof. Journal de Mathématiques Pures et Appliquées (1867), 177--184. http://eudml.org/doc/234989Google Scholar
Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2020. Approximate query processing for data exploration using deep generative models. In 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 1309--1320.Google ScholarCross Ref
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD nternational conference on Management of data. 147--156.Google ScholarDigital Library
Tolga Urhan and Michael J Franklin. 2000. XJoin: A Reactively-Scheduled Pipelined Join Operator. Bulletin of the Technical Committee on (2000), 27.Google Scholar
A. W. van der Vaart. 1998. Asymptotic Statistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511802256Google ScholarCross Ref
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? arXiv preprint arXiv:2012.06743 (2020).Google Scholar
Sai Wu, Shouxu Jiang, Beng Chin Ooi, and Kian-Lee Tan. 2009. Distributed online aggregations. Proceedings of the VLDB Endowment 2, 1 (2009), 443--454.Google ScholarDigital Library
Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 651--662.Google ScholarDigital Library
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems.Google Scholar
Jian Yang, Kamalakar Karlapalem, and Qing Li. 1997. Algorithms for materialized view design in data warehousing environment. In VLDB, Vol. 97. 136--145.Google Scholar
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).Google Scholar
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).Google Scholar
Kai Zeng, Sameer Agarwal, Ankur Dave, Michael Armbrust, and Ion Stoica. 2015. G-OLA: Generalized on-line aggregation for interactive analysis on big data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 913--918.Google ScholarDigital Library
Kai Zeng, Sameer Agarwal, and Ion Stoica. 2016. IOLAP: Managing uncertainty for efficient incremental OLAP. In Proceedings of the 2016 international conference on management of data. 1347--1361.Google ScholarDigital Library
Meifan Zhang and Hongzhi Wang. 2021. Approximate query processing for group-by queries based on conditional generative models. arXiv preprint arXiv:2101.02914 (2021).Google Scholar
Marcin Zukowski, Mark Van de Wiel, and Peter Boncz. 2012. Vectorwise: A vectorized analytical DBMS. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1349--1350.Google ScholarDigital Library

Index Terms

A Step Toward Deep Online Aggregation

Recommendations

Prozessüberwachung mittels Niederfeld-NMR-Spektroskopie als Online-Methode / Process Monitoring Using Low-Field NMR Spectroscopy as an Online Method
Read More
Kompetitive Analyse fuer Online-Algorithmen--- Eine kommentierte Bibliographie
Read More
Sampling estimators for parallel online aggregation
BNCOD'13: Proceedings of the 29th British National conference on Big Data

Online aggregation provides estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution. When coupled with parallel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2023
Published in pacmmod Volume 1, Issue 2

Permissions
Request permissions about this article.
Request Permissions
Badges
Author Tags
SQL
cardinality estimation
confidence interval
data frame
evolving data frame
nested query
online aggregation
time-series forecasting
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 149
  Total Downloads
- Downloads (Last 12 months)149
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Step Toward Deep Online Aggregation

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Prozessüberwachung mittels Niederfeld-NMR-Spektroskopie als Online-Methode / Process Monitoring Using Low-Field NMR Spectroscopy as an Online Method

Kompetitive Analyse fuer Online-Algorithmen--- Eine kommentierte Bibliographie

Sampling estimators for parallel online aggregation