Abstract
Cardinality estimation, predicting the query result size, is a fundamental problem in databases. Existing skyline cardinality estimation methods are computationally infeasible for massive skyline queries over the large-scale database. In this paper, we introduce a unified skyline family w.r.t. various skyline variants. We propose an efficient and effective skyline family cardinality estimation model, named EECE, in an end-to-end manner. EECE consists of two modules, unsupervised data distribution learning (DDL) and supervised monotonic cardinality estimation (MCE). DDL leverages the mixture data guided transformer to learn the distribution of database and query parameters for model pre-training. MCE further incorporates supervised learning and parameter clamping to enhance the estimation under monotonicity guarantees. We develop an efficient incremental learning algorithm for EECE to adapt the database and query logs update. Extensive experiments on several real-world and synthetic datasets demonstrate that, EECE speeds up the cardinality estimation by six orders of magnitude, with more than 39% accuracy gain, compared to the state-of-the-art approaches.
Supplemental Material
- Stephan Borzsony, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In ICDE. 421--430.Google Scholar
- Chee-Yong Chan, HV Jagadish, Kian-Lee Tan, Anthony KH Tung, and Zhenjie Zhang. 2006. Finding k-dominant skylines in high dimensional space. In SIGMOD. 503--514.Google Scholar
- Surajit Chaudhuri, Nilesh Dalvi, and Raghav Kaushik. 2006. Robust cardinality and cost estimation for skyline operator. In ICDE. 64--73.Google Scholar
- Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. 1691--1703.Google Scholar
- GI Cooperative and Fort Collins. 1988. The unique qualities of a geographic information system: A commentary. Photogrammetric Engineering and Remote Sensing, Vol. 54, 11 (1988), 1547--9.Google Scholar
- Evangelos Dellis and Bernhard Seeger. 2007. Efficient computation of reverse skyline queries.. In VLDB. 291--302.Google Scholar
- Thomas D'Roza and George Bilchev. 2003. An overview of location-based services. BT Technology Journal, Vol. 21, 1 (2003), 20--27.Google ScholarDigital Library
- Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, Vol. 12, 9 (2019), 1044--1057.Google ScholarDigital Library
- Hannes Eder and Fang Wei. 2009. Evaluation of skyline algorithms in PostgreSQL. In IDEAS. 334--337.Google Scholar
- Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent. 2010. Why does unsupervised pre-training help deep learning?. In AISTATS. 201--208.Google Scholar
- Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In SIGKDD. 109--117.Google Scholar
- Xiaoyi Fu, Xiaoye Miao, Jianliang Xu, and Yunjun Gao. 2017. Continuous range-based skyline queries in road networks. World Wide Web, Vol. 20, 6 (2017), 1443--1467.Google ScholarDigital Library
- Malay Haldar, Prashant Ramanathan, Tyler Sax, Mustafa Abdool, Lanbo Zhang, Aamir Mansawala, Shulin Yang, Bradley Turnbull, and Junshuo Liao. 2020. Improving deep learning for airbnb search. In SIGKDD. 2822--2830.Google Scholar
- Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021).Google Scholar
- Nicolas Hanusse, Patrick Kamnang Wanko, and Sofian Maabout. 2016. Using histograms for skyline size estimation. In IDEAS. 125--134.Google Scholar
- Hazar Harmouch and Felix Naumann. 2017. Cardinality estimation: An experimental survey. Proceedings of the VLDB Endowment, Vol. 11, 4 (2017), 499--512.Google ScholarDigital Library
- Robert L Heckman and William R King. 1994. Behavioral indicators of customer satisfaction with vendor-provided information services. In ICIS. 429--444.Google Scholar
- David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In ECCV. 749--765.Google Scholar
- Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 992--1005.Google ScholarDigital Library
- Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, et al. 2021. Fleetrec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In SIGKDD. 3097--3105.Google Scholar
- Christos Kalyvas and Theodoros Tzouramanis. 2017. A survey of skyline query processing. ArXiv Preprint ArXiv:1704.01788 (2017).Google Scholar
- Werner Kießling and Gerhard Köstler. 2002. Preference SQL$-$Design, implementation, experiences. In VLDB. 990--1001.Google Scholar
- Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned cardinality estimation: An in-depth study. In SIGMOD. 1214--1227.Google Scholar
- Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018).Google Scholar
- Julia A Lasserre, Christopher M Bishop, and Thomas P Minka. 2006. Principled hybrids of generative and discriminative models. In CVPR. 87--94.Google Scholar
- Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.Google ScholarDigital Library
- Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In SIGMOD. 1--14.Google Scholar
- Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: Fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1950--1963.Google ScholarDigital Library
- Cheng Luo, Zhewei Jiang, Wen-Chi Hou, Shan He, and Qiang Zhu. 2012. A sampling approach for skyline query cardinality estimation. Knowledge and Information Systems, Vol. 32, 2 (2012), 281--301.Google ScholarCross Ref
- Stefan Mandl, Oleksandr Kozachuk, Markus Endres, and Werner Kießling. 2015. Preference analytics in EXASolution. In BTW 2015. 613--632.Google Scholar
- Xiaoye Miao, Yunjun Gao, Gang Chen, and Tianyi Zhang. 2016. K-dominant skyline queries on incomplete data. Information Sciences, Vol. 367 (2016), 990--1011.Google ScholarDigital Library
- Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, and Qing Li. 2019. Answering skyline queries over incomplete data with crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2019), 1360--1374.Google ScholarDigital Library
- Xiaoye Miao, Yunjun Gao, Baihua Zheng, Gang Chen, and Huiyong Cui. 2015. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 1 (2015), 252--266.Google ScholarDigital Library
- Xiaoye Miao, Yunjun Gao, Linlin Zhou, Wei Wang, and Qing Li. 2018. Optimizing quality for probabilistic skyline computation and probabilistic similarity search. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 9 (2018), 1741--1755.Google ScholarDigital Library
- Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, Jun Wang, and Jianwei Yin. 2022b. Efficient and effective data imputation with influence functions. Proceedings of the VLDB Endowment, Vol. 15, 3 (2022), 624--632.Google ScholarDigital Library
- Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, and Jianwei Yin. 2022a. An experimental survey of missing data imputation algorithms. IEEE Transactions on Knowledge and Data Engineering, Vol. 1, 1 (2022), 1--20.Google Scholar
- Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, and Jianwei Yin. 2021. Generative semi-supervised learning for multivariate time series imputation. In AAAI. 8983--8991.Google Scholar
- Guido Moerkotte, David DeHaan, Norman May, Anisoara Nica, and Alexander Böhm. 2014. Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA. In SIGMOD. 361--372.Google Scholar
- Lin Ning, Steve Chien, Shuang Song, Mei Chen, Yunqi Xue, and Devora Berlowitz. 2022. EANA: Reducing privacy risk on large-scale recommendation models. In RecSys. 399--407.Google Scholar
- Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Transactions on Database Systems, Vol. 30, 1 (2005), 41--82.Google ScholarDigital Library
- Jian Pei, Wen Jin, Martin Ester, and Yufei Tao. 2005. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB. 253--264.Google Scholar
- PostgreSQL. 1996. https://www.postgresql.org/. (1996).Google Scholar
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.Google Scholar
- Pau Rodr'iguez, Miguel A Bautista, Jordi Gonzalez, and Sergio Escalera. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing, Vol. 75 (2018), 21--31.Google ScholarCross Ref
- Ji Sun, Guoliang Li, and Nan Tang. 2021a. Learned cardinality estimation for similarity queries. In SIGMOD. 1745--1757.Google Scholar
- Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021b. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 85--97.Google ScholarDigital Library
- Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, and Gang Chen. 2022. PreQR: Pre-training representation for SQL understanding. In SIGMOD. 204--216.Google Scholar
- Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347--10357.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.Google Scholar
- Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021a. FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 72--84.Google ScholarDigital Library
- Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021b. Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1640--1654.Google ScholarDigital Library
- Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa. 2020. Consistent and flexible selectivity estimation for high-dimensional data. In SIGMOD. 2319--2327.Google Scholar
- Peizhi Wu and Gao Cong. 2021. A unified deep model of learning from both data and queries for cardinality estimation. In SIGMOD. 2009--2022.Google Scholar
- Tian Xia, Donghui Zhang, and Yufei Tao. 2008. On skylining with flexible dominance relation. In ICDE. 1397--1399.Google Scholar
- Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One cardinality estimator for all tables. ArXiv Preprint ArXiv:2006.08109 (2020).Google Scholar
- Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, Vol. 13, 3 (2019), 279--292.Google ScholarDigital Library
- Nan Zhang, Chengkai Li, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das. 2013. On skyline groups. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 4 (2013), 942--956.Google ScholarDigital Library
- Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias, and Anthony Tung. 2009. Kernel-based skyline cardinality estimation. In SIGMOD. 509--522.Google Scholar
- Kangfei Zhao, Jeffrey Xu Yu, Zongyan He, Rui Li, and Hao Zhang. 2022. Lightweight and accurate cardinality estimation by neural network gaussian process. In SIGMOD. 973--987.Google Scholar
- Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1489--1502.Google ScholarDigital Library
Index Terms
- Efficient and Effective Cardinality Estimation for Skyline Family
Recommendations
FactorJoin: A New Cardinality Estimation Framework for Join Queries
PACMMODCardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on ...
Kernel-based skyline cardinality estimation
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataThe skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing ...
On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets
The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are ...
Comments