Abstract
Cardinality estimation for spatial queries plays an important role in query scheduling and optimization. Spatial datasets are fully dynamic, and this setting necessitates an update-friendly, low-latency, and accurate cardinality estimator. However, existing cardinality estimation methods suffer from time-consuming updates and/or inefficient estimation. This work proposes SAFE (Sampling-Assisted Fast learned cardinality Estimator), which is carefully designed for dynamic spatial data. We specifically develop a sampling strategy that uses a quad-tree-based data partitioning and extracts a small subset, to enable fast training of cardinality estimation models. In addition, we employ 2-tier regression models to approximate the spatial data distribution while achieving accurate and fast cardinality estimation. We furthermore provide an incremental model update strategy to avoid re-training all models from scratch when we receive updates. We conduct experiments on real and synthetic datasets. Their results demonstrate that SAFE (i) outperforms state-of-the-art cardinality estimation models and (ii) efficiently handles data updates while ensuring accurate and low-latency estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The order of the x- and y-dimensions is changeable.
- 2.
It is also possible to use weights \(\{w_1, w_2, w_3\}\) for the criteria.
- 3.
- 4.
Recall Sect. 2, and, to update QuickSel’s model for the up-to-date dataset, we need to compute the results of all existing queries, which requires more than a few seconds even for hundreds of queries. This is too slow for dynamic data, thereby we followed the original strategy of QuickSel (i.e., its model was not updated). In addition, LHist does not support data updates and needs reconstruction (incurring at least a few seconds for each update), thus we did not use it as a competitor.
References
Amagata, D., Hara, T.: Monitoring MaxRS in spatial data streams. In: EDBT, pp. 317–328 (2016)
Amagata, D., Hara, T.: Identifying the most interactive object in spatial databases. In: ICDE, pp. 1286–1297 (2019)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Eldawy, A., Mokbel, M.F.: The era of big spatial data. In: ICDE Workshops, pp. 42–49 (2015)
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1–9 (1974)
Han, Y., et al.: Cardinality estimation in DBMS: a comprehensive benchmark evaluation. PVLDB 752–765 (2021)
Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: SIGMOD, pp. 1035–1050 (2020)
Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: DeepDB: learn from data, not from queries! PVLDB 13(7), 992–1005 (2020)
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P.A., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. In: CIDR (2019)
Lesh, F.H.: Multi-dimensional least-squares polynomial curve fitting. Commun. ACM 2(9), 29–30 (1959)
Liu, Q., Shen, Y., Chen, L.: LHist: towards learning multi-dimensional histogram for massive spatial data. In: ICDE, pp. 1188–1199 (2021)
Meng, Z., Wu, P., Cong, G., Zhu, R., Ma, S.: Unsupervised selectivity estimation by integrating Gaussian mixture models and an autoregressive model. In: EDBT, pp. 2–247 (2022)
Moti, M.H., Simatis, P., Papadias, D.: Waffle: a workload-aware and query-sensitive framework for disk-based spatial indexing. PVLDB 16(4), 670–683 (2022)
Nishio, S., Amagata, D., Hara, T.: Lamps: location-aware moving top-k pub/sub. IEEE Trans. Knowl. Data Eng. 34(01), 352–364 (2022)
Park, Y., Zhong, S., Mozafari, B.: QuickSel: quick selectivity learning with mixture models. In: SIGMOD, pp. 1017–1033 (2020)
Sun, L., Li, C., Ji, T., Chen, H.: MOSE: a monotonic selectivity estimator using learned CDF. IEEE Trans. Knowl. Data Eng. 35(3), 2823–2836 (2023)
Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)
Wang, J., Chai, C., Liu, J., Li, G.: Face: a normalizing flow based cardinality estimator. PVLDB 15(1), 72–84 (2021)
Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C.: Updatable learned index with precise positions. PVLDB 14(8), 1276–1288 (2021)
Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: SIGMOD, pp. 2009–2022 (2021)
Yang, Z., et al.: NeuroCard: one cardinality estimator for all tables. PVLDB 14(1), 61–73 (2020)
Acknowledgements
This work was partially supported by AIP Acceleration Research JPMJCR23U2, JST, and JSPS KAKENHI Grant Number 24K14961.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ji, Y., Amagata, D., Sasaki, Y., Hara, T. (2024). SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data. In: Strauss, C., Amagasa, T., Manco, G., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2024. Lecture Notes in Computer Science, vol 14911. Springer, Cham. https://doi.org/10.1007/978-3-031-68312-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-68312-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68311-4
Online ISBN: 978-3-031-68312-1
eBook Packages: Computer ScienceComputer Science (R0)