Skip to main content
Log in

Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Although the popular database systems perform well on query optimization, they still face poor query execution plans when the join operations across multiple tables are complex. Bad execution planning usually results in bad cardinality estimations. The cardinality estimation models in traditional databases cannot provide high-quality estimation, because they are not capable of capturing the correlation between multiple tables in an effective fashion. Recently, the state-of-the-art learning-based cardinality estimation is estimated to work better than the traditional empirical methods. Basically, they used deep neural networks to compute the relationships and correlations of tables. In this paper, we propose a vertical scanning convolutional neural network (abbreviated as VSCNN) to capture the relationships between words in the word vector in order to generate a feature map. The proposed learning-based cardinality estimator converts Structured Query Language (SQL) queries from a sentence to a word vector and we encode table names in the one-hot encoding method and the samples into bitmaps, separately, and then merge them to obtain enough semantic information from data samples. In particular, the feature map obtained by VSCNN contains semantic information including tables, joins, and predicates about SQL queries. Importantly, in order to improve the accuracy of cardinality estimation, we propose the negative sampling method for training the word vector by gradient descent from the base table and compress it into a bitmap. Extensive experiments are conducted and the results show that the estimation quality of q-error of the proposed vertical scanning convolutional neural network based model is reduced by at least 14.6% when compared with the estimators in traditional databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Leis V, Radke B, Gubichev A, Kemper A, Neumann T. Cardinality estimation done right: Index-based join sampling. In Proc. the 8th Biennial Conference on Innovative Data Systems Research, Jan. 2017.

  2. Li G, Zhou X, Li S. XuanYuan: An AI-native database. IEEE Data Eng. Bull., 2019, 42(2): 70-81.

    Google Scholar 

  3. Kipf A, Kipf T, Radke B, Leis V, Boncz P A, Kemper A. Learned cardinalities: Estimating correlated joins with deep learning. In Proc. the 9th Biennial Conference on Innovative Data Systems Research, Jan. 2019.

  4. Ioannidis Y E. The history of histograms (abridged). In Proc. the 29th International Conference on Very Large Data Bases, Sept. 2003, pp.19-30. https://doi.org/10.1016/B978-012722442-8/50011-2.

  5. Giroire F. Order statistics and estimating cardinalities of massive data sets. Discret. Appl. Math., 2009, 157(2): 406-427. https://doi.org/10.1016/j.dam.2008.06.020.

    Article  MathSciNet  MATH  Google Scholar 

  6. Flajolet P, Martin G N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 1985, 31(2): 182-209. https://doi.org/10.1016/0022-0000(85)90041-8.

    Article  MathSciNet  MATH  Google Scholar 

  7. Durand M, Flajolet P. Loglog counting of large cardinalities. In Proc. the 11th Annual European Symposium, Sept. 2003, pp.605-617. https://doi.org/10.1007/978-3-540-39658-1_55.

  8. Flajolet P, Fusy É, Gandouet O, Meunier F. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Proc. the 2007 Conference on Analysis of Algorithm , Jun. 2007, pp.137-156.

  9. Whang K, Zanden B T V, Taylor H M. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 1990, 15(2): 208-229. https://doi.org/10.1145/78922.78925.

    Article  Google Scholar 

  10. Wu W, Naughton J F, Singh H. Sampling-based query reoptimization. In Proc. the 2016 International Conference on Management of Data, June 26–July 1, 2016, pp.1721-1736. https://doi.org/10.1145/2882903.2882914.

  11. Lipton R J, Naughton J F, Schneider D A. Practical selectivity estimation through adaptive sampling. In Proc. the 1990 ACM SIGMOD International Conference on Management of Data, May 1990, pp.1-11. https://doi.org/10.1145/93605.93611.

  12. Olken F, Rotem D. Random sampling from database files: A survey. In Proc. the 5th International Conference on Statistical and Scientific Database Management, Apr. 1990, pp.92-111. https://doi.org/10.1007/3-540-52342-1_23.

  13. Estan C, Naughton J F. End-biased samples for join cardinality estimation. In Proc. the 22nd International Conference on Data Engineering, Apr. 2006, Article No. 20. https://doi.org/10.1109/ICDE.2006.61.

  14. Neumann T, Leis V, Kemper A. The complete story of joins (in hyper). In Proc. the Datenbanksysteme für Business, Technologie und Web, Mar. 2017, pp.31-50.

  15. Neumann T, Radke B. Adaptive optimization of very large join queries. In Proc. the 2018 International Conference on Management of Data, Jun. 2018, pp.677-692. https://doi.org/10.1145/3183713.3183733.

  16. Zhang W E, Sheng Q Z, Qin Y, Taylor K, Yao L. Learning-based SPARQL query performance modeling and prediction. World Wide Web, 2018, 21(4): 1015-1035. https://doi.org/10.1007/s11280-017-0498-1.

    Article  Google Scholar 

  17. Leis V, Gubichev A, Mirchev A, Boncz P A, Kemper A, Neumann T. How good are query optimizers, really? Proc. VLDB Endow., 2015, 9(3): 204-215. https://doi.org/10.14778/2850583.2850594.

  18. Lakshmi M S, Zhou S. Selectivity estimation in extensible databases—A neural network approach. In Proc. the 24th International Conference on Very Large Data Bases, Aug. 1998, pp.623-627.

  19. Malik T, Burns R C, Chawla N V. A black-box approach to query cardinality estimation. In Proc. the 3rd Biennial Conference on Innovative Data Systems Research, Jan. 2007, pp.56-67.

  20. Yang Z, Liang E, Kamsetty A, Wu C, Duan Y, Chen X, Abbeel P, Hellerstein J M, Krishnan S, Stoica I. Selectivity estimation with deep likelihood models. arXiv:1905.04278, 2019. http://arxiv.org/abs/1905.04278, Aug. 2020.

  21. Liu H, Xu M, Yu Z, Corvinelli V, Zuzarte C. Cardinality estimation using neural networks. In Proc. the 25th Annual International Conference on Computer Science and Software Engineering, Nov. 2015, pp.53-59.

  22. Knagenhjelm P, Brauer P. Classification of vowels in continuous speech using MLP and a hybrid net. Speech Commun., 1990, 9(1): 31-34. https://doi.org/10.1016/0167-6393(90)90042-8.

    Article  Google Scholar 

  23. Mahmoud M A B, Guo P. DNA sequence classification based on MLP with PILAE algorithm. Soft Comput., 2021, 25(5): 4003-4014. https://doi.org/10.1007/s00500-020-05429-y.

    Article  Google Scholar 

  24. Sun J, Li G. An end-to-end learning-based cost estimator. Proc. VLDB Endow., 2019, 13(3): 307-319. https://doi.org/10.14778/3368289.3368296.

  25. Yu X, Li G, Chai C, Tang N. Reinforcement learning with tree-LSTM for join order selection. In Proc. the 36th IEEE International Conference on Data Engineering, Apr. 2020, pp.1297-1308. https://doi.org/10.1109/ICDE48307.2020.00116.

  26. Zhang J, Liu Y, Zhou K, Li G, Xiao Z, Cheng B, Xing J, Wang Y, Cheng T, Liu L, Ran M, Li Z. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proc. the 2019 International Conference on Management of Data, Jun. 2019, pp.415-432. https://doi.org/10.1145/3299869.3300085.

  27. Li G, Zhou X, Li S, Gao B. QTune: A query-aware database tuning system with deep reinforcement learning. Proc. VLDB Endow., 2019, 12(12): 2118-2130. https://doi.org/10.14778/3352063.3352129.

  28. Li G, Chai C, Fan J, Weng X, Li J, Zheng Y, Li Y, Yu X, Zhang X, Yuan H. CDB: Optimizing queries with crowd-based selections and joins. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.1463-1478. https://doi.org/10.1145/3035918.3064036.

  29. Fan J, Li G, Zhou L. Interactive SQL query suggestion: Making databases user-friendly. In Proc. the 27th International Conference on Data Engineering, Apr. 2011, pp.351-362. https://doi.org/10.1109/ICDE.2011.5767843.

  30. Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2013, pp.3111-3119.

  31. Zimmer R, Pellegrini T, Singh S F, Masquelier T. Supervised training of convolutional spiking neural networks with PyTorch. arXiv:1911.10124, 2019. https://arxiv.org/abs/1911.10124, Nov. 2020.

  32. Al-Mouhamed M A, Hasan Khan A, Mohammad N. A review of CUDA optimization techniques and tools for structured grid computing. Computing, 2020, 102(4): 977-1003. https://doi.org/10.1007/s00607-019-00744-1.

    Article  MathSciNet  Google Scholar 

  33. Liu B, Liang Y. Optimal function approximation with ReLU neural networks. Neurocomputing, 2021, 435: 216-227. https://doi.org/10.1016/j.neucom.2021.01.007.

    Article  Google Scholar 

  34. Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012. https://arxiv.org/abs/1207.0580, May 2021.

  35. Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.

  36. Moerkotte G, Neumann T, Steidl G. Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endow., 2009, 2(1): 982-993. https://doi.org/10.14778/1687627.1687738.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Han.

Supplementary Information

ESM 1

(PDF 201 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, SJ., Yang, GP., Han, N. et al. Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network. J. Comput. Sci. Technol. 36, 762–777 (2021). https://doi.org/10.1007/s11390-021-1351-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-1351-7

Keywords

Navigation