skip to main content
research-article

Efficient and Effective Cardinality Estimation for Skyline Family

Published:30 May 2023Publication History
Skip Abstract Section

Abstract

Cardinality estimation, predicting the query result size, is a fundamental problem in databases. Existing skyline cardinality estimation methods are computationally infeasible for massive skyline queries over the large-scale database. In this paper, we introduce a unified skyline family w.r.t. various skyline variants. We propose an efficient and effective skyline family cardinality estimation model, named EECE, in an end-to-end manner. EECE consists of two modules, unsupervised data distribution learning (DDL) and supervised monotonic cardinality estimation (MCE). DDL leverages the mixture data guided transformer to learn the distribution of database and query parameters for model pre-training. MCE further incorporates supervised learning and parameter clamping to enhance the estimation under monotonicity guarantees. We develop an efficient incremental learning algorithm for EECE to adapt the database and query logs update. Extensive experiments on several real-world and synthetic datasets demonstrate that, EECE speeds up the cardinality estimation by six orders of magnitude, with more than 39% accuracy gain, compared to the state-of-the-art approaches.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1mod104.mp4

mp4

21.8 MB

References

  1. Stephan Borzsony, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In ICDE. 421--430.Google ScholarGoogle Scholar
  2. Chee-Yong Chan, HV Jagadish, Kian-Lee Tan, Anthony KH Tung, and Zhenjie Zhang. 2006. Finding k-dominant skylines in high dimensional space. In SIGMOD. 503--514.Google ScholarGoogle Scholar
  3. Surajit Chaudhuri, Nilesh Dalvi, and Raghav Kaushik. 2006. Robust cardinality and cost estimation for skyline operator. In ICDE. 64--73.Google ScholarGoogle Scholar
  4. Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. 1691--1703.Google ScholarGoogle Scholar
  5. GI Cooperative and Fort Collins. 1988. The unique qualities of a geographic information system: A commentary. Photogrammetric Engineering and Remote Sensing, Vol. 54, 11 (1988), 1547--9.Google ScholarGoogle Scholar
  6. Evangelos Dellis and Bernhard Seeger. 2007. Efficient computation of reverse skyline queries.. In VLDB. 291--302.Google ScholarGoogle Scholar
  7. Thomas D'Roza and George Bilchev. 2003. An overview of location-based services. BT Technology Journal, Vol. 21, 1 (2003), 20--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, Vol. 12, 9 (2019), 1044--1057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hannes Eder and Fang Wei. 2009. Evaluation of skyline algorithms in PostgreSQL. In IDEAS. 334--337.Google ScholarGoogle Scholar
  10. Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent. 2010. Why does unsupervised pre-training help deep learning?. In AISTATS. 201--208.Google ScholarGoogle Scholar
  11. Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In SIGKDD. 109--117.Google ScholarGoogle Scholar
  12. Xiaoyi Fu, Xiaoye Miao, Jianliang Xu, and Yunjun Gao. 2017. Continuous range-based skyline queries in road networks. World Wide Web, Vol. 20, 6 (2017), 1443--1467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Malay Haldar, Prashant Ramanathan, Tyler Sax, Mustafa Abdool, Lanbo Zhang, Aamir Mansawala, Shulin Yang, Bradley Turnbull, and Junshuo Liao. 2020. Improving deep learning for airbnb search. In SIGKDD. 2822--2830.Google ScholarGoogle Scholar
  14. Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021).Google ScholarGoogle Scholar
  15. Nicolas Hanusse, Patrick Kamnang Wanko, and Sofian Maabout. 2016. Using histograms for skyline size estimation. In IDEAS. 125--134.Google ScholarGoogle Scholar
  16. Hazar Harmouch and Felix Naumann. 2017. Cardinality estimation: An experimental survey. Proceedings of the VLDB Endowment, Vol. 11, 4 (2017), 499--512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert L Heckman and William R King. 1994. Behavioral indicators of customer satisfaction with vendor-provided information services. In ICIS. 429--444.Google ScholarGoogle Scholar
  18. David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In ECCV. 749--765.Google ScholarGoogle Scholar
  19. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 992--1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, et al. 2021. Fleetrec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In SIGKDD. 3097--3105.Google ScholarGoogle Scholar
  21. Christos Kalyvas and Theodoros Tzouramanis. 2017. A survey of skyline query processing. ArXiv Preprint ArXiv:1704.01788 (2017).Google ScholarGoogle Scholar
  22. Werner Kießling and Gerhard Köstler. 2002. Preference SQL$-$Design, implementation, experiences. In VLDB. 990--1001.Google ScholarGoogle Scholar
  23. Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned cardinality estimation: An in-depth study. In SIGMOD. 1214--1227.Google ScholarGoogle Scholar
  24. Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018).Google ScholarGoogle Scholar
  25. Julia A Lasserre, Christopher M Bishop, and Thomas P Minka. 2006. Principled hybrids of generative and discriminative models. In CVPR. 87--94.Google ScholarGoogle Scholar
  26. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In SIGMOD. 1--14.Google ScholarGoogle Scholar
  28. Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: Fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1950--1963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Cheng Luo, Zhewei Jiang, Wen-Chi Hou, Shan He, and Qiang Zhu. 2012. A sampling approach for skyline query cardinality estimation. Knowledge and Information Systems, Vol. 32, 2 (2012), 281--301.Google ScholarGoogle ScholarCross RefCross Ref
  30. Stefan Mandl, Oleksandr Kozachuk, Markus Endres, and Werner Kießling. 2015. Preference analytics in EXASolution. In BTW 2015. 613--632.Google ScholarGoogle Scholar
  31. Xiaoye Miao, Yunjun Gao, Gang Chen, and Tianyi Zhang. 2016. K-dominant skyline queries on incomplete data. Information Sciences, Vol. 367 (2016), 990--1011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, and Qing Li. 2019. Answering skyline queries over incomplete data with crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2019), 1360--1374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xiaoye Miao, Yunjun Gao, Baihua Zheng, Gang Chen, and Huiyong Cui. 2015. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 1 (2015), 252--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xiaoye Miao, Yunjun Gao, Linlin Zhou, Wei Wang, and Qing Li. 2018. Optimizing quality for probabilistic skyline computation and probabilistic similarity search. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 9 (2018), 1741--1755.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, Jun Wang, and Jianwei Yin. 2022b. Efficient and effective data imputation with influence functions. Proceedings of the VLDB Endowment, Vol. 15, 3 (2022), 624--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, and Jianwei Yin. 2022a. An experimental survey of missing data imputation algorithms. IEEE Transactions on Knowledge and Data Engineering, Vol. 1, 1 (2022), 1--20.Google ScholarGoogle Scholar
  37. Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, and Jianwei Yin. 2021. Generative semi-supervised learning for multivariate time series imputation. In AAAI. 8983--8991.Google ScholarGoogle Scholar
  38. Guido Moerkotte, David DeHaan, Norman May, Anisoara Nica, and Alexander Böhm. 2014. Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA. In SIGMOD. 361--372.Google ScholarGoogle Scholar
  39. Lin Ning, Steve Chien, Shuang Song, Mei Chen, Yunqi Xue, and Devora Berlowitz. 2022. EANA: Reducing privacy risk on large-scale recommendation models. In RecSys. 399--407.Google ScholarGoogle Scholar
  40. Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Transactions on Database Systems, Vol. 30, 1 (2005), 41--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jian Pei, Wen Jin, Martin Ester, and Yufei Tao. 2005. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB. 253--264.Google ScholarGoogle Scholar
  42. PostgreSQL. 1996. https://www.postgresql.org/. (1996).Google ScholarGoogle Scholar
  43. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.Google ScholarGoogle Scholar
  44. Pau Rodr'iguez, Miguel A Bautista, Jordi Gonzalez, and Sergio Escalera. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing, Vol. 75 (2018), 21--31.Google ScholarGoogle ScholarCross RefCross Ref
  45. Ji Sun, Guoliang Li, and Nan Tang. 2021a. Learned cardinality estimation for similarity queries. In SIGMOD. 1745--1757.Google ScholarGoogle Scholar
  46. Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021b. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 85--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, and Gang Chen. 2022. PreQR: Pre-training representation for SQL understanding. In SIGMOD. 204--216.Google ScholarGoogle Scholar
  48. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347--10357.Google ScholarGoogle Scholar
  49. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.Google ScholarGoogle Scholar
  50. Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021a. FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 72--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021b. Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1640--1654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa. 2020. Consistent and flexible selectivity estimation for high-dimensional data. In SIGMOD. 2319--2327.Google ScholarGoogle Scholar
  53. Peizhi Wu and Gao Cong. 2021. A unified deep model of learning from both data and queries for cardinality estimation. In SIGMOD. 2009--2022.Google ScholarGoogle Scholar
  54. Tian Xia, Donghui Zhang, and Yufei Tao. 2008. On skylining with flexible dominance relation. In ICDE. 1397--1399.Google ScholarGoogle Scholar
  55. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One cardinality estimator for all tables. ArXiv Preprint ArXiv:2006.08109 (2020).Google ScholarGoogle Scholar
  56. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, Vol. 13, 3 (2019), 279--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Nan Zhang, Chengkai Li, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das. 2013. On skyline groups. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 4 (2013), 942--956.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias, and Anthony Tung. 2009. Kernel-based skyline cardinality estimation. In SIGMOD. 509--522.Google ScholarGoogle Scholar
  59. Kangfei Zhao, Jeffrey Xu Yu, Zongyan He, Rui Li, and Hao Zhang. 2022. Lightweight and accurate cardinality estimation by neural network gaussian process. In SIGMOD. 973--987.Google ScholarGoogle Scholar
  60. Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1489--1502.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and Effective Cardinality Estimation for Skyline Family

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 1
      PACMMOD
      May 2023
      2807 pages
      EISSN:2836-6573
      DOI:10.1145/3603164
      Issue’s Table of Contents

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 May 2023
      Published in pacmmod Volume 1, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)308
      • Downloads (Last 6 weeks)24

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader