skip to main content

A Step Toward Deep Online Aggregation

Published:20 June 2023Publication History
Skip Abstract Section

Abstract

For exploratory data analysis, it is often desirable to know what answers you are likely to get before actually obtaining those answers. This can potentially be achieved by designing systems to offer the estimates of a data operation result-say op(data)-earlier in the process based on partial data processing. Those estimates continuously refine as more data is processed and finally converge to the exact answer. Unfortunately, the existing techniques-called Online Aggregation (OLA)-are limited to a single operation; that is, we cannot obtain the estimates for op(op(data)) or op(...(op(data))). If this Deep OLA becomes possible, data analysts will be able to explore data more interactively using complex cascade operations.

In this work, we take a step toward Deep OLA with evolving data frames (edf), a novel data model to offer OLA for nested ops-op(...(op(data)))-by representing an evolving structured data (with converging estimates) that is closed under set operations. That is, op(edf) produces yet another edf; thus, we can freely apply successive operations to edf and obtain an OLA output for each op. We evaluate its viability with Wake, an edf-based OLA system, by examining against state-of-the-art OLA and non-OLA systems. In our experiments on TPC-H dataset, Wake produces its first estimates 4.93× faster (median)-with 1.3× median slowdown for exact answers-compared to conventional systems. Besides its generality, Wake is also 1.92× faster (median) than existing OLA systems in producing estimates of under 1% relative errors.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1mod124.mp4

Presentation video for SIGMOD 2023

mp4

232.6 MB

References

  1. Accessed: 2022--10-01. MySQL 8.0 Reference - FIND_IN_SET. https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set.Google ScholarGoogle Scholar
  2. Accessed: 2022--10-01. MySQL 8.0 Reference - GROUP_CONCAT. https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_group-concat.Google ScholarGoogle Scholar
  3. Accessed: 2022--10--15. Apache Arrow. https://arrow.apache.org/.Google ScholarGoogle Scholar
  4. Accessed: 2022--10--15. Apache Parquet. https://parquet.apache.org/.Google ScholarGoogle Scholar
  5. Accessed: 2022--10--15. ProgressiveDB. https://github.com/DataManagementLab/progressiveDB.Google ScholarGoogle Scholar
  6. Accessed: 2022--10--15. TPC-H: Decision Support Benchmark. https://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  7. Accessed: 2022--10--15. XDB: approXimate DataBase (XDB). https://github.com/InitialDLab/XDB.Google ScholarGoogle Scholar
  8. Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems. 29--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Divyakant Agrawal, Amr El Abbadi, Ambuj Singh, and Tolga Yurek. 1997. Efficient view maintenance at data warehouses. ACM SIGMOD Record 26, 2 (1997), 417--427.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. Proceedings of the VLDB Endowment 2, 2 (2009), 1566--1569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brian Babcock, Surajit Chaudhuri, and Gautam Das. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 539--550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. SAQE: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment 13, 12 (2020), 2691--2705.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lukas Berg, Tobias Ziegler, Carsten Binnig, and Uwe Röhm. 2019. ProgressiveDB: progressive data analytics as a middleware. Proceedings of the VLDB Endowment 12, 12 (2019), 1814--1817.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jose A Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently updating materialized views. ACM SIGMOD Record 15, 2 (1986), 61--71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Leonid V. Bogachev, Alexander V. Gnedin, and Yuri V. Yakubovich. 2008. On the variance of the number of occupied boxes. Advances in Applied Mathematics 40, 4 (2008), 401--432. https://doi.org/10.1016/j.aam.2007.05.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2 (2001), 199--223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Surajit Chaudhuri, Gautam Das, and Vivek Narasayya. 2007. Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems (TODS) 32, 2 (2007), 9--es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate query processing: No silver bullet. In Proceedings of the 2017 ACM International Conference on Management of Data. 511--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Surajit Chaudhuri, Bolin Ding, and Srikanth Kandula. 2017. Approximate Query Processing: No Silver Bullet. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 511--519. https://doi.org/10.1145/3035918.3056097Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. 1995. Optimizing queries with materialized views. In Proceedings of the Eleventh International Conference on Data Engineering. IEEE, 190--200.Google ScholarGoogle ScholarCross RefCross Ref
  21. Shimin Chen, Phillip B Gibbons, and Suman Nath. 2010. Pr-join: a non-blocking join achieving higher early result rate with statistical guarantees. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 147--158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Supawit Chockchowwat, Wenjie Liu, and Yongjoo Park. 2022. Automatically Finding Optimal Index Structure. arXiv preprint arXiv:2208.03823 (2022).Google ScholarGoogle Scholar
  23. Supawit Chockchowwat, Chaitanya Sood, and Yongjoo Park. 2022. Airphant: Cloud-oriented Document Indexing. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1368--1381.Google ScholarGoogle Scholar
  24. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, and Russell Sears. 2010. Online aggregation and continuous query support in mapreduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 1115--1118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Graham Cormode. 2011. Sketch techniques for approximate query processing. Foundations and Trends in Databases. NOW publishers (2011), 15.Google ScholarGoogle Scholar
  26. Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. 2011. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends® in Databases 4, 1--3 (2011), 1--294.Google ScholarGoogle Scholar
  27. Andrew Crotty, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2016. The case for interactive data exploration accelerators (IDEAs). In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Herbert A David and Haikady N Nagaraja. 2004. Order statistics. John Wiley & Sons.Google ScholarGoogle Scholar
  29. Jens-Peter Dittrich, Bernhard Seeger, David Scot Taylor, and Peter Widmayer. 2002. Progressive merge join: a generic and non-blocking sort-based join algorithm. In Proceedings of the 28th international conference on Very Large Data Bases. 299--310.Google ScholarGoogle Scholar
  30. Bradley Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics 7, 1 (1979), 1 -- 26. https://doi.org/10.1214/aos/1176344552Google ScholarGoogle ScholarCross RefCross Ref
  31. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science, 137--156.Google ScholarGoogle Scholar
  32. Jonathan Goldstein and Per-Åke Larson. 2001. Optimizing queries using materialized views: a practical, scalable solution. ACM SIGMOD Record 30, 2 (2001), 331--342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ashish Gupta, Inderpal Singh Mumick, and Venkatramanan Siva Subrahmanian. 1993. Maintaining views incrementally. ACM SIGMOD Record 22, 2 (1993), 157--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Peter J Haas and Joseph M Hellerstein. 1999. Ripple joins for online aggregation. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 287--298.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, and Lynne Stokes. 1995. Sampling-Based Estimation of the Number of Distinct Values of an Attribute. In VLDB. Morgan Kaufmann, 311--322.Google ScholarGoogle Scholar
  36. Fumio Hayashi. 2000. Econometrics. Princeton University Press. 27--32 pages.Google ScholarGoogle Scholar
  37. Wen He, Yongjoo Park, Idris Hanafi, Jacob Yatvitskiy, and Barzan Mozafari. 2018. Demonstration of VerdictDB, the platform-independent AQP system. In Proceedings of the 2018 International Conference on Management of Data. 1665--1668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Joseph M Hellerstein, Peter J Haas, and Helen J Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data. 171--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ian Hellström. Accessed: 2022--10-01. Oracle SQL & PL/SQL Optimization for Developers. https://oracle.readthedocs.io/en/latest/sql/joins/hash-join.html.Google ScholarGoogle Scholar
  40. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. Deepdb: Learn from data, not from queries! arXiv preprint arXiv:1909.00607 (2019).Google ScholarGoogle Scholar
  41. Yannis E Ioannidis and Viswanath Poosala. 1999. Histogram-based approximation of set-valued query-answers. In VLDB, Vol. 99. 174--185.Google ScholarGoogle Scholar
  42. Chris Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2008. Scalable approximate query processing with the dbo engine. ACM Transactions on Database Systems (TODS) 33, 4 (2008), 1--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Christopher Jermaine, Alin Dobra, Subramanian Arumugam, Shantanu Joshi, and Abhijit Pol. 2006. The sort-merge-shrink join. ACM Transactions on Database Systems (TODS) 31, 4 (2006), 1382--1416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. QuickR: Lazily approximating complex adhoc queries in bigdata clusters. In Proceedings of the 2016 international conference on management of data. 631--646.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, and Ronitt Rubinfeld. 2015. Rapid sampling for visualizations with ordering guarantees. In Proceedings of the vldb endowment international conference on very large data bases, Vol. 8. NIH Public Access, 521.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Samuel Kotz and Saralees Nadarajah. 2000. Extreme Value Distributions. PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO. https://doi.org/10.1142/p191 arXiv:https://www.worldscientific.com/doi/pdf/10.1142/p191Google ScholarGoogle ScholarCross RefCross Ref
  47. Harry H. Ku. 2010. Notes on the Use of Propagation of Error Formulas. In Journal of Research of the National Bureau of Standards, Section C: Engineering and Instrumentation, Vol. 2.Google ScholarGoogle Scholar
  48. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. arXiv preprint arXiv:1208.4173 (2012).Google ScholarGoogle Scholar
  49. Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data. 615--629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zhaoheng Li, Xinyu Pi, and Yongjoo Park. 2023. S/C: Speeding up Data Materialization with Bounded Memory. In 2023 IEEE 39th international conference on data engineering (ICDE). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment 14, 11 (2021), 1950--1963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Gang Luo, Curt J Ellmann, Peter J Haas, and Jeffrey F Naughton. 2002. A scalable hash ripple join algorithm. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 252--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wes McKinney et al. 2011. pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing 14, 9 (2011), 1--9.Google ScholarGoogle Scholar
  55. Frank McSherry, Derek Gordon Murray, Rebecca Isaacs, and Michael Isard. 2013. Differential Dataflow.. In CIDR.Google ScholarGoogle Scholar
  56. John Meehan, Nesime Tatbul, Stan Zdonik, Cansu Aslantas, Ugur Cetintemel, Jiang Du, Tim Kraska, Samuel Madden, David Maier, Andrew Pavlo, et al . 2015. S-Store: Streaming Meets Transaction Processing. Proceedings of the VLDB Endowment 8, 13 (2015).Google ScholarGoogle Scholar
  57. Mohamed F Mokbel, Ming Lu, and Walid G Aref. 2004. Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In Proceedings. 20th International Conference on Data Engineering. IEEE, 251--262.Google ScholarGoogle ScholarCross RefCross Ref
  58. Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 439--455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Milos Nikolic, Mohammed Elseidy, and Christoph Koch. 2014. LINVIEW: incremental view maintenance for complex analytical queries. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 253--264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Niketan Pansare, Vinayak Borkar, Chris Jermaine, and Tyson Condie. 2011. Online aggregation for large mapreduce jobs. Proceedings of the VLDB Endowment 4, 11 (2011), 1135--1145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data synthesis based on generative adversarial networks. arXiv preprint arXiv:1806.03384 (2018).Google ScholarGoogle Scholar
  62. Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2015. Neighbor-Sensitive Hashing. Proceedings of the VLDB Endowment 9, 3 (2015), 144--155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2016. Visualization-aware sampling for very large databases. In 2016 IEEE 32nd international conference on data engineering (ICDE). IEEE, 755--766.Google ScholarGoogle ScholarCross RefCross Ref
  64. Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. VerdictDB: Universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. 1461--1476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yongjoo Park, Jingyi Qing, Xiaoyang Shen, and Barzan Mozafari. 2019. BlinkML: Efficient maximum likelihood estimation with probabilistic guarantees. In Proceedings of the 2019 International Conference on Management of Data. 1135--1152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, and Barzan Mozafari. 2017. Database learning: Toward a database that becomes smarter every time. In Proceedings of the 2017 ACM International Conference on Management of Data. 587--602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2020. QuickSel: Quick selectivity learning with mixture models. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1017--1033.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Emanuel Parzen. 1962. On estimation of a probability density function and mode. The annals of mathematical statistics 33, 3 (1962), 1065--1076.Google ScholarGoogle Scholar
  69. pola rs. Accessed: 2022--10--14. Polars: Lightning-fast DataFrame library for Rust and Python. https://www.pola.rs/.Google ScholarGoogle Scholar
  70. Georg Pólya. 1920. Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem. Mathematische Zeitschrift 8 (1920), 171--181.Google ScholarGoogle ScholarCross RefCross Ref
  71. Viswanath Poosala, Venkatesh Ganti, and Yannis E. Ioannidis. 1999. Approximate query answering using histograms. IEEE Data Eng. Bull. 22, 4 (1999), 5--14.Google ScholarGoogle Scholar
  72. Murray Rosenblatt. 1956. Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics (1956), 832--837.Google ScholarGoogle Scholar
  73. Kenneth Salem, Kevin Beyer, Bruce Lindsay, and Roberta Cochrane. 2000. How to roll a join: Asynchronous incremental view maintenance. ACM SIGMOD Record 29, 2 (2000), 129--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813. https://doi.org/10.1109/ICDE.2019.00196Google ScholarGoogle ScholarCross RefCross Ref
  75. Nikhil Sheoran, Supawit Chockchowwat, Arav Chheda, Suwen Wang, Riya Verma, and Yongjoo Park. 2022. A Step Toward Deep Online Aggregation (Extended Version). arXiv preprint arXiv:2303.04103 (2022).Google ScholarGoogle Scholar
  76. Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, and Vikas Maddukuri. 2022. Conditional Generative Model Based Predicate-Aware Query Approximation. Proceedings of the AAAI Conference on Artificial Intelligence 36, 8 (Jun. 2022), 8259--8266. https://doi.org/10.1609/aaai.v36i8.20800Google ScholarGoogle ScholarCross RefCross Ref
  77. Yingjie Shi, Xiaofeng Meng, Fusheng Wang, and Yantao Gan. 2012. You Can Stop Early with COLA: Online Processing of Aggregate Queries in the Cloud. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (Maui, Hawaii, USA) (CIKM '12). Association for Computing Machinery, New York, NY, USA, 1223--1232. https://doi.org/10.1145/2396761.2398423Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. P. Tchébychef. 1867. Des valeurs moyennes (Traduction du russe, N. de Khanikof. Journal de Mathématiques Pures et Appliquées (1867), 177--184. http://eudml.org/doc/234989Google ScholarGoogle Scholar
  79. Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2020. Approximate query processing for data exploration using deep generative models. In 2020 IEEE 36th international conference on data engineering (ICDE). IEEE, 1309--1320.Google ScholarGoogle ScholarCross RefCross Ref
  80. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD nternational conference on Management of data. 147--156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Tolga Urhan and Michael J Franklin. 2000. XJoin: A Reactively-Scheduled Pipelined Join Operator. Bulletin of the Technical Committee on (2000), 27.Google ScholarGoogle Scholar
  82. A. W. van der Vaart. 1998. Asymptotic Statistics. Cambridge University Press. https://doi.org/10.1017/CBO9780511802256Google ScholarGoogle ScholarCross RefCross Ref
  83. Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are we ready for learned cardinality estimation? arXiv preprint arXiv:2012.06743 (2020).Google ScholarGoogle Scholar
  84. Sai Wu, Shouxu Jiang, Beng Chin Ooi, and Kian-Lee Tan. 2009. Distributed online aggregations. Proceedings of the VLDB Endowment 2, 1 (2009), 443--454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Sai Wu, Beng Chin Ooi, and Kian-Lee Tan. 2010. Continuous sampling for online aggregation over multiple queries. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 651--662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  87. Jian Yang, Kamalakar Karlapalem, and Qing Li. 1997. Algorithms for materialized view design in data warehousing environment. In VLDB, Vol. 97. 136--145.Google ScholarGoogle Scholar
  88. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. arXiv preprint arXiv:2006.08109 (2020).Google ScholarGoogle Scholar
  89. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. arXiv preprint arXiv:1905.04278 (2019).Google ScholarGoogle Scholar
  90. Kai Zeng, Sameer Agarwal, Ankur Dave, Michael Armbrust, and Ion Stoica. 2015. G-OLA: Generalized on-line aggregation for interactive analysis on big data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 913--918.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Kai Zeng, Sameer Agarwal, and Ion Stoica. 2016. IOLAP: Managing uncertainty for efficient incremental OLAP. In Proceedings of the 2016 international conference on management of data. 1347--1361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Meifan Zhang and Hongzhi Wang. 2021. Approximate query processing for group-by queries based on conditional generative models. arXiv preprint arXiv:2101.02914 (2021).Google ScholarGoogle Scholar
  93. Marcin Zukowski, Mark Van de Wiel, and Peter Boncz. 2012. Vectorwise: A vectorized analytical DBMS. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1349--1350.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Step Toward Deep Online Aggregation

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader