research-article

Efficient and Effective Cardinality Estimation for Skyline Family

Authors:
Xiaoye Miao

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China

0000-0002-8632-1539
View Profile

,
Yangyang Wu

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China

0000-0001-9531-0906
View Profile

,
Jiazhen Peng

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China

0009-0000-6573-0918
View Profile

,
Yunjun Gao

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China

0000-0003-3816-8450
View Profile

,
Jianwei Yin

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China

0000-0003-4703-7348
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 1Article No.: 104pp 1–21https://doi.org/10.1145/3588958

Published:30 May 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Cardinality estimation, predicting the query result size, is a fundamental problem in databases. Existing skyline cardinality estimation methods are computationally infeasible for massive skyline queries over the large-scale database. In this paper, we introduce a unified skyline family w.r.t. various skyline variants. We propose an efficient and effective skyline family cardinality estimation model, named EECE, in an end-to-end manner. EECE consists of two modules, unsupervised data distribution learning (DDL) and supervised monotonic cardinality estimation (MCE). DDL leverages the mixture data guided transformer to learn the distribution of database and query parameters for model pre-training. MCE further incorporates supervised learning and parameter clamping to enhance the estimation under monotonicity guarantees. We develop an efficient incremental learning algorithm for EECE to adapt the database and query logs update. Extensive experiments on several real-world and synthetic datasets demonstrate that, EECE speeds up the cardinality estimation by six orders of magnitude, with more than 39% accuracy gain, compared to the state-of-the-art approaches.

Supplemental Material

PACMMOD-V1mod104.mp4

mp4

21.8 MB

Download

References

Stephan Borzsony, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In ICDE. 421--430.Google Scholar
Chee-Yong Chan, HV Jagadish, Kian-Lee Tan, Anthony KH Tung, and Zhenjie Zhang. 2006. Finding k-dominant skylines in high dimensional space. In SIGMOD. 503--514.Google Scholar
Surajit Chaudhuri, Nilesh Dalvi, and Raghav Kaushik. 2006. Robust cardinality and cost estimation for skyline operator. In ICDE. 64--73.Google Scholar
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. 1691--1703.Google Scholar
GI Cooperative and Fort Collins. 1988. The unique qualities of a geographic information system: A commentary. Photogrammetric Engineering and Remote Sensing, Vol. 54, 11 (1988), 1547--9.Google Scholar
Evangelos Dellis and Bernhard Seeger. 2007. Efficient computation of reverse skyline queries.. In VLDB. 291--302.Google Scholar
Thomas D'Roza and George Bilchev. 2003. An overview of location-based services. BT Technology Journal, Vol. 21, 1 (2003), 20--27.Google ScholarDigital Library
Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, Vol. 12, 9 (2019), 1044--1057.Google ScholarDigital Library
Hannes Eder and Fang Wei. 2009. Evaluation of skyline algorithms in PostgreSQL. In IDEAS. 334--337.Google Scholar
Dumitru Erhan, Aaron Courville, Yoshua Bengio, and Pascal Vincent. 2010. Why does unsupervised pre-training help deep learning?. In AISTATS. 201--208.Google Scholar
Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi--task learning. In SIGKDD. 109--117.Google Scholar
Xiaoyi Fu, Xiaoye Miao, Jianliang Xu, and Yunjun Gao. 2017. Continuous range-based skyline queries in road networks. World Wide Web, Vol. 20, 6 (2017), 1443--1467.Google ScholarDigital Library
Malay Haldar, Prashant Ramanathan, Tyler Sax, Mustafa Abdool, Lanbo Zhang, Aamir Mansawala, Shulin Yang, Bradley Turnbull, and Junshuo Liao. 2020. Improving deep learning for airbnb search. In SIGKDD. 2822--2830.Google Scholar
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. ArXiv Preprint ArXiv:2109.05877 (2021).Google Scholar
Nicolas Hanusse, Patrick Kamnang Wanko, and Sofian Maabout. 2016. Using histograms for skyline size estimation. In IDEAS. 125--134.Google Scholar
Hazar Harmouch and Felix Naumann. 2017. Cardinality estimation: An experimental survey. Proceedings of the VLDB Endowment, Vol. 11, 4 (2017), 499--512.Google ScholarDigital Library
Robert L Heckman and William R King. 1994. Behavioral indicators of customer satisfaction with vendor-provided information services. In ICIS. 429--444.Google Scholar
David Held, Sebastian Thrun, and Silvio Savarese. 2016. Learning to track at 100 FPS with deep regression networks. In ECCV. 749--765.Google Scholar
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from data, not from queries! Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 992--1005.Google ScholarDigital Library
Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, et al. 2021. Fleetrec: Large-scale recommendation inference on hybrid GPU-FPGA clusters. In SIGKDD. 3097--3105.Google Scholar
Christos Kalyvas and Theodoros Tzouramanis. 2017. A survey of skyline query processing. ArXiv Preprint ArXiv:1704.01788 (2017).Google Scholar
Werner Kießling and Gerhard Köstler. 2002. Preference SQL$-$Design, implementation, experiences. In VLDB. 990--1001.Google Scholar
Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, and Jaehyok Chong. 2022. Learned cardinality estimation: An in-depth study. In SIGMOD. 1214--1227.Google Scholar
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: Estimating correlated joins with deep learning. ArXiv Preprint ArXiv:1809.00677 (2018).Google Scholar
Julia A Lasserre, Christopher M Bishop, and Thomas P Minka. 2006. Principled hybrids of generative and discriminative models. In CVPR. 87--94.Google Scholar
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.Google ScholarDigital Library
Beibin Li, Yao Lu, and Srikanth Kandula. 2022. Warper: Efficiently adapting learned cardinality estimators to data and workload drifts. In SIGMOD. 1--14.Google Scholar
Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. 2021. Fauce: Fast and accurate deep ensembles with uncertainty for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1950--1963.Google ScholarDigital Library
Cheng Luo, Zhewei Jiang, Wen-Chi Hou, Shan He, and Qiang Zhu. 2012. A sampling approach for skyline query cardinality estimation. Knowledge and Information Systems, Vol. 32, 2 (2012), 281--301.Google ScholarCross Ref
Stefan Mandl, Oleksandr Kozachuk, Markus Endres, and Werner Kießling. 2015. Preference analytics in EXASolution. In BTW 2015. 613--632.Google Scholar
Xiaoye Miao, Yunjun Gao, Gang Chen, and Tianyi Zhang. 2016. K-dominant skyline queries on incomplete data. Information Sciences, Vol. 367 (2016), 990--1011.Google ScholarDigital Library
Xiaoye Miao, Yunjun Gao, Su Guo, Lu Chen, Jianwei Yin, and Qing Li. 2019. Answering skyline queries over incomplete data with crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 4 (2019), 1360--1374.Google ScholarDigital Library
Xiaoye Miao, Yunjun Gao, Baihua Zheng, Gang Chen, and Huiyong Cui. 2015. Top-k dominating queries on incomplete data. IEEE Transactions on Knowledge and Data Engineering, Vol. 28, 1 (2015), 252--266.Google ScholarDigital Library
Xiaoye Miao, Yunjun Gao, Linlin Zhou, Wei Wang, and Qing Li. 2018. Optimizing quality for probabilistic skyline computation and probabilistic similarity search. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 9 (2018), 1741--1755.Google ScholarDigital Library
Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, Jun Wang, and Jianwei Yin. 2022b. Efficient and effective data imputation with influence functions. Proceedings of the VLDB Endowment, Vol. 15, 3 (2022), 624--632.Google ScholarDigital Library
Xiaoye Miao, Yangyang Wu, Lu Chen, Yunjun Gao, and Jianwei Yin. 2022a. An experimental survey of missing data imputation algorithms. IEEE Transactions on Knowledge and Data Engineering, Vol. 1, 1 (2022), 1--20.Google Scholar
Xiaoye Miao, Yangyang Wu, Jun Wang, Yunjun Gao, Xudong Mao, and Jianwei Yin. 2021. Generative semi-supervised learning for multivariate time series imputation. In AAAI. 8983--8991.Google Scholar
Guido Moerkotte, David DeHaan, Norman May, Anisoara Nica, and Alexander Böhm. 2014. Exploiting ordered dictionaries to efficiently construct histograms with q-error guarantees in SAP HANA. In SIGMOD. 361--372.Google Scholar
Lin Ning, Steve Chien, Shuang Song, Mei Chen, Yunqi Xue, and Devora Berlowitz. 2022. EANA: Reducing privacy risk on large-scale recommendation models. In RecSys. 399--407.Google Scholar
Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger. 2005. Progressive skyline computation in database systems. ACM Transactions on Database Systems, Vol. 30, 1 (2005), 41--82.Google ScholarDigital Library
Jian Pei, Wen Jin, Martin Ester, and Yufei Tao. 2005. Catching the best views of skyline: A semantic approach based on decisive subspaces. In VLDB. 253--264.Google Scholar
PostgreSQL. 1996. https://www.postgresql.org/. (1996).Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.Google Scholar
Pau Rodr'iguez, Miguel A Bautista, Jordi Gonzalez, and Sergio Escalera. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing, Vol. 75 (2018), 21--31.Google ScholarCross Ref
Ji Sun, Guoliang Li, and Nan Tang. 2021a. Learned cardinality estimation for similarity queries. In SIGMOD. 1745--1757.Google Scholar
Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and Nan Tang. 2021b. Learned cardinality estimation: A design space exploration and a comparative evaluation. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 85--97.Google ScholarDigital Library
Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, and Gang Chen. 2022. PreQR: Pre-training representation for SQL understanding. In SIGMOD. 204--216.Google Scholar
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347--10357.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998--6008.Google Scholar
Jiayi Wang, Chengliang Chai, Jiabin Liu, and Guoliang Li. 2021a. FACE: A normalizing flow based cardinality estimator. Proceedings of the VLDB Endowment, Vol. 15, 1 (2021), 72--84.Google ScholarDigital Library
Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2021b. Are we ready for learned cardinality estimation? Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1640--1654.Google ScholarDigital Library
Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa. 2020. Consistent and flexible selectivity estimation for high-dimensional data. In SIGMOD. 2319--2327.Google Scholar
Peizhi Wu and Gao Cong. 2021. A unified deep model of learning from both data and queries for cardinality estimation. In SIGMOD. 2009--2022.Google Scholar
Tian Xia, Donghui Zhang, and Yufei Tao. 2008. On skylining with flexible dominance relation. In ICDE. 1397--1399.Google Scholar
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One cardinality estimator for all tables. ArXiv Preprint ArXiv:2006.08109 (2020).Google Scholar
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, Vol. 13, 3 (2019), 279--292.Google ScholarDigital Library
Nan Zhang, Chengkai Li, Naeemul Hassan, Sundaresan Rajasekaran, and Gautam Das. 2013. On skyline groups. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 4 (2013), 942--956.Google ScholarDigital Library
Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias, and Anthony Tung. 2009. Kernel-based skyline cardinality estimation. In SIGMOD. 509--522.Google Scholar
Kangfei Zhao, Jeffrey Xu Yu, Zongyan He, Rui Li, and Hao Zhang. 2022. Lightweight and accurate cardinality estimation by neural network gaussian process. In SIGMOD. 973--987.Google Scholar
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, lightweight and accurate method for cardinality estimation. Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1489--1502.Google ScholarDigital Library

Index Terms

Efficient and Effective Cardinality Estimation for Skyline Family
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization

Recommendations

FactorJoin: A New Cardinality Estimation Framework for Join Queries
PACMMOD

Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on ...
Read More
Kernel-based skyline cardinality estimation
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing ...
Read More
On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2023
Published in pacmmod Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions
Author Tags
cardinality estimation
data distribution learning
monotonic cardinality estimation
skyline family
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 308
  Total Downloads
- Downloads (Last 12 months)308
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient and Effective Cardinality Estimation for Skyline Family

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Kernel-based skyline cardinality estimation

On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient and Effective Cardinality Estimation for Skyline Family

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Kernel-based skyline cardinality estimation

On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media