Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

Qiao, Shao-Jie; Yang, Guo-Ping; Han, Nan; Chen, Hao; Huang, Fa-Liang; Yue, Kun; Yi, Yu-Gen; Yuan, Chang-An

doi:10.1007/s11390-021-1351-7

Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

Regular Paper
Published: 30 July 2021

Volume 36, pages 762–777, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Shao-Jie Qiao¹,
Guo-Ping Yang¹,
Nan Han²,
Hao Chen³,
Fa-Liang Huang⁴,
Kun Yue⁵,
Yu-Gen Yi⁶ &
…
Chang-An Yuan⁷

204 Accesses
2 Citations
Explore all metrics

Abstract

Although the popular database systems perform well on query optimization, they still face poor query execution plans when the join operations across multiple tables are complex. Bad execution planning usually results in bad cardinality estimations. The cardinality estimation models in traditional databases cannot provide high-quality estimation, because they are not capable of capturing the correlation between multiple tables in an effective fashion. Recently, the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods. Basically, they used deep neural networks to compute the relationships and correlations of tables. In this paper, we propose a vertical scanning convolutional neural network (abbreviated as VSCNN) to capture the relationships between words in the word vector in order to generate a feature map. The proposed learning-based cardinality estimator converts Structured Query Language (SQL) queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps, separately, and then merge them to obtain enough semantic information from data samples. In particular, the feature map obtained by VSCNN contains semantic information including tables, joins, and predicates about SQL queries. Importantly, in order to improve the accuracy of cardinality estimation, we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap. Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6% when compared with the estimators in traditional databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepPlaner: Query Optimization Using Recurrent Neural Network

An intelligent automatic query generation interface for relational databases using deep learning technique

Article 22 August 2019

Web Table Column Type Detection Using Deep Learning and Probability Graph Model

References

Leis V, Radke B, Gubichev A, Kemper A, Neumann T. Cardinality estimation done right: Index-based join sampling. In Proc. the 8th Biennial Conference on Innovative Data Systems Research, Jan. 2017.
Li G, Zhou X, Li S. XuanYuan: An AI-native database. IEEE Data Eng. Bull., 2019, 42(2): 70-81.
Google Scholar
Kipf A, Kipf T, Radke B, Leis V, Boncz P A, Kemper A. Learned cardinalities: Estimating correlated joins with deep learning. In Proc. the 9th Biennial Conference on Innovative Data Systems Research, Jan. 2019.
Ioannidis Y E. The history of histograms (abridged). In Proc. the 29th International Conference on Very Large Data Bases, Sept. 2003, pp.19-30. https://doi.org/10.1016/B978-012722442-8/50011-2.
Giroire F. Order statistics and estimating cardinalities of massive data sets. Discret. Appl. Math., 2009, 157(2): 406-427. https://doi.org/10.1016/j.dam.2008.06.020.
Article MathSciNet MATH Google Scholar
Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 1985, 31(2): 182-209. https://doi.org/10.1016/0022-0000(85)90041-8.
Article MathSciNet MATH Google Scholar
Durand M, Flajolet P. Loglog counting of large cardinalities. In Proc. the 11th Annual European Symposium, Sept. 2003, pp.605-617. https://doi.org/10.1007/978-3-540-39658-1_55.
Flajolet P, Fusy É, Gandouet O, Meunier F. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Proc. the 2007 Conference on Analysis of Algorithm , Jun. 2007, pp.137-156.
Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 1990, 15(2): 208-229. https://doi.org/10.1145/78922.78925.
Article Google Scholar
Wu W, Naughton J F, Singh H. Sampling-based query reoptimization. In Proc. the 2016 International Conference on Management of Data, June 26–July 1, 2016, pp.1721-1736. https://doi.org/10.1145/2882903.2882914.
Lipton R J, Naughton J F, Schneider D A. Practical selectivity estimation through adaptive sampling. In Proc. the 1990 ACM SIGMOD International Conference on Management of Data, May 1990, pp.1-11. https://doi.org/10.1145/93605.93611.
Olken F, Rotem D. Random sampling from database files: A survey. In Proc. the 5th International Conference on Statistical and Scientific Database Management, Apr. 1990, pp.92-111. https://doi.org/10.1007/3-540-52342-1_23.
Estan C, Naughton J F. End-biased samples for join cardinality estimation. In Proc. the 22nd International Conference on Data Engineering, Apr. 2006, Article No. 20. https://doi.org/10.1109/ICDE.2006.61.
Neumann T, Leis V, Kemper A. The complete story of joins (in hyper). In Proc. the Datenbanksysteme für Business, Technologie und Web, Mar. 2017, pp.31-50.
Neumann T, Radke B. Adaptive optimization of very large join queries. In Proc. the 2018 International Conference on Management of Data, Jun. 2018, pp.677-692. https://doi.org/10.1145/3183713.3183733.
Zhang W E, Sheng Q Z, Qin Y, Taylor K, Yao L. Learning-based SPARQL query performance modeling and prediction. World Wide Web, 2018, 21(4): 1015-1035. https://doi.org/10.1007/s11280-017-0498-1.
Article Google Scholar
Leis V, Gubichev A, Mirchev A, Boncz P A, Kemper A, Neumann T. How good are query optimizers, really? Proc. VLDB Endow., 2015, 9(3): 204-215. https://doi.org/10.14778/2850583.2850594.
Lakshmi M S, Zhou S. Selectivity estimation in extensible databases—A neural network approach. In Proc. the 24th International Conference on Very Large Data Bases, Aug. 1998, pp.623-627.
Malik T, Burns R C, Chawla N V. A black-box approach to query cardinality estimation. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, Jan. 2007, pp.56-67.
Yang Z, Liang E, Kamsetty A, Wu C, Duan Y, Chen X, Abbeel P, Hellerstein J M, Krishnan S, Stoica I. Selectivity estimation with deep likelihood models. arXiv:1905.04278, 2019. http://arxiv.org/abs/1905.04278, Aug. 2020.
Liu H, Xu M, Yu Z, Corvinelli V, Zuzarte C. Cardinality estimation using neural networks. In Proc. the 25th Annual International Conference on Computer Science and Software Engineering, Nov. 2015, pp.53-59.
Knagenhjelm P, Brauer P. Classification of vowels in continuous speech using MLP and a hybrid net. Speech Commun., 1990, 9(1): 31-34. https://doi.org/10.1016/0167-6393(90)90042-8.
Article Google Scholar
Mahmoud M A B, Guo P. DNA sequence classification based on MLP with PILAE algorithm. Soft Comput., 2021, 25(5): 4003-4014. https://doi.org/10.1007/s00500-020-05429-y.
Article Google Scholar
Sun J, Li G. An end-to-end learning-based cost estimator. Proc. VLDB Endow., 2019, 13(3): 307-319. https://doi.org/10.14778/3368289.3368296.
Yu X, Li G, Chai C, Tang N. Reinforcement learning with tree-LSTM for join order selection. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.1297-1308. https://doi.org/10.1109/ICDE48307.2020.00116.
Zhang J, Liu Y, Zhou K, Li G, Xiao Z, Cheng B, Xing J, Wang Y, Cheng T, Liu L, Ran M, Li Z. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proc. the 2019 International Conference on Management of Data, Jun. 2019, pp.415-432. https://doi.org/10.1145/3299869.3300085.
Li G, Zhou X, Li S, Gao B. QTune: A query-aware database tuning system with deep reinforcement learning. Proc. VLDB Endow., 2019, 12(12): 2118-2130. https://doi.org/10.14778/3352063.3352129.
Li G, Chai C, Fan J, Weng X, Li J, Zheng Y, Li Y, Yu X, Zhang X, Yuan H. CDB: Optimizing queries with crowd-based selections and joins. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1463-1478. https://doi.org/10.1145/3035918.3064036.
Fan J, Li G, Zhou L. Interactive SQL query suggestion: Making databases user-friendly. In Proc. the 27th International Conference on Data Engineering, Apr. 2011, pp.351-362. https://doi.org/10.1109/ICDE.2011.5767843.
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2013, pp.3111-3119.
Zimmer R, Pellegrini T, Singh S F, Masquelier T. Supervised training of convolutional spiking neural networks with PyTorch. arXiv:1911.10124, 2019. https://arxiv.org/abs/1911.10124, Nov. 2020.
Al-Mouhamed M A, Hasan Khan A, Mohammad N. A review of CUDA optimization techniques and tools for structured grid computing. Computing, 2020, 102(4): 977-1003. https://doi.org/10.1007/s00607-019-00744-1.
Article MathSciNet Google Scholar
Liu B, Liang Y. Optimal function approximation with ReLU neural networks. Neurocomputing, 2021, 435: 216-227. https://doi.org/10.1016/j.neucom.2021.01.007.
Article Google Scholar
Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580, May 2021.
Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
Moerkotte G, Neumann T, Steidl G. Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endow., 2009, 2(1): 982-993. https://doi.org/10.14778/1687627.1687738.

Download references

Author information

Authors and Affiliations

School of Software Engineering, Chengdu University of Information Technology, Chengdu, 610225, China
Shao-Jie Qiao & Guo-Ping Yang
School of Management, Chengdu University of Information Technology, Chengdu, 610225, China
Nan Han
Beijing Huawei Digital Technologies Co., Ltd., Beijing, 100085, China
Hao Chen
School of Computer and Information Engineering, Nanning Normal University, Nanning, 530299, China
Fa-Liang Huang
School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
Kun Yue
School of Software, Jiangxi Normal University, Nanchang, 330022, China
Yu-Gen Yi
Guangxi College of Education, Nanning, 530007, China
Chang-An Yuan

Authors

Shao-Jie Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Ping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Nan Han
View author publications
You can also search for this author in PubMed Google Scholar
Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fa-Liang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yue
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gen Yi
View author publications
You can also search for this author in PubMed Google Scholar
Chang-An Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Han.

Supplementary Information

ESM 1

(PDF 201 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiao, SJ., Yang, GP., Han, N. et al. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network. J. Comput. Sci. Technol. 36, 762–777 (2021). https://doi.org/10.1007/s11390-021-1351-7

Download citation

Received: 01 February 2021
Accepted: 01 July 2021
Published: 30 July 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11390-021-1351-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

Abstract

Access this article

Similar content being viewed by others

DeepPlaner: Query Optimization Using Recurrent Neural Network

An intelligent automatic query generation interface for relational databases using deep learning technique

Web Table Column Type Detection Using Deep Learning and Probability Graph Model

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

Abstract

Access this article

Similar content being viewed by others

DeepPlaner: Query Optimization Using Recurrent Neural Network

An intelligent automatic query generation interface for relational databases using deep learning technique

Web Table Column Type Detection Using Deep Learning and Probability Graph Model

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation