SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data

Ji, Yuchen; Amagata, Daichi; Sasaki, Yuya; Hara, Takahiro

doi:10.1007/978-3-031-68312-1_16

Yuchen Ji¹³,
Daichi Amagata¹³,
Yuya Sasaki¹³ &
…
Takahiro Hara¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14911))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

398 Accesses
1 Citations

Abstract

Cardinality estimation for spatial queries plays an important role in query scheduling and optimization. Spatial datasets are fully dynamic, and this setting necessitates an update-friendly, low-latency, and accurate cardinality estimator. However, existing cardinality estimation methods suffer from time-consuming updates and/or inefficient estimation. This work proposes SAFE (Sampling-Assisted Fast learned cardinality Estimator), which is carefully designed for dynamic spatial data. We specifically develop a sampling strategy that uses a quad-tree-based data partitioning and extracts a small subset, to enable fast training of cardinality estimation models. In addition, we employ 2-tier regression models to approximate the spatial data distribution while achieving accurate and fast cardinality estimation. We furthermore provide an incremental model update strategy to avoid re-training all models from scratch when we receive updates. We conduct experiments on real and synthetic datasets. Their results demonstrate that SAFE (i) outperforms state-of-the-art cardinality estimation models and (ii) efficiently handles data updates while ensuring accurate and low-latency estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PolyCard: A learned cardinality estimator for intersection queries on spatial polygons

Article Open access 22 January 2025

Automating localized learning for cardinality estimation based on XGBoost

Article 01 June 2024

Cardinality estimation using normalizing flow

Article 29 August 2023

Notes

1.
The order of the x- and y-dimensions is changeable.
2.
It is also possible to use weights $\{w_1, w_2, w_3\}$ for the criteria.
3.
http://download.geofabrik.de/.
4.
Recall Sect. 2, and, to update QuickSel’s model for the up-to-date dataset, we need to compute the results of all existing queries, which requires more than a few seconds even for hundreds of queries. This is too slow for dynamic data, thereby we followed the original strategy of QuickSel (i.e., its model was not updated). In addition, LHist does not support data updates and needs reconstruction (incurring at least a few seconds for each update), thus we did not use it as a competitor.

References

Amagata, D., Hara, T.: Monitoring MaxRS in spatial data streams. In: EDBT, pp. 317–328 (2016)
Google Scholar
Amagata, D., Hara, T.: Identifying the most interactive object in spatial databases. In: ICDE, pp. 1286–1297 (2019)
Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Eldawy, A., Mokbel, M.F.: The era of big spatial data. In: ICDE Workshops, pp. 42–49 (2015)
Google Scholar
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1–9 (1974)
Article Google Scholar
Han, Y., et al.: Cardinality estimation in DBMS: a comprehensive benchmark evaluation. PVLDB 752–765 (2021)
Google Scholar
Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: SIGMOD, pp. 1035–1050 (2020)
Google Scholar
Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: DeepDB: learn from data, not from queries! PVLDB 13(7), 992–1005 (2020)
Google Scholar
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P.A., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. In: CIDR (2019)
Google Scholar
Lesh, F.H.: Multi-dimensional least-squares polynomial curve fitting. Commun. ACM 2(9), 29–30 (1959)
Article Google Scholar
Liu, Q., Shen, Y., Chen, L.: LHist: towards learning multi-dimensional histogram for massive spatial data. In: ICDE, pp. 1188–1199 (2021)
Google Scholar
Meng, Z., Wu, P., Cong, G., Zhu, R., Ma, S.: Unsupervised selectivity estimation by integrating Gaussian mixture models and an autoregressive model. In: EDBT, pp. 2–247 (2022)
Google Scholar
Moti, M.H., Simatis, P., Papadias, D.: Waffle: a workload-aware and query-sensitive framework for disk-based spatial indexing. PVLDB 16(4), 670–683 (2022)
Google Scholar
Nishio, S., Amagata, D., Hara, T.: Lamps: location-aware moving top-k pub/sub. IEEE Trans. Knowl. Data Eng. 34(01), 352–364 (2022)
Article Google Scholar
Park, Y., Zhong, S., Mozafari, B.: QuickSel: quick selectivity learning with mixture models. In: SIGMOD, pp. 1017–1033 (2020)
Google Scholar
Sun, L., Li, C., Ji, T., Chen, H.: MOSE: a monotonic selectivity estimator using learned CDF. IEEE Trans. Knowl. Data Eng. 35(3), 2823–2836 (2023)
Google Scholar
Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)
Google Scholar
Wang, J., Chai, C., Liu, J., Li, G.: Face: a normalizing flow based cardinality estimator. PVLDB 15(1), 72–84 (2021)
Google Scholar
Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C.: Updatable learned index with precise positions. PVLDB 14(8), 1276–1288 (2021)
Google Scholar
Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: SIGMOD, pp. 2009–2022 (2021)
Google Scholar
Yang, Z., et al.: NeuroCard: one cardinality estimator for all tables. PVLDB 14(1), 61–73 (2020)
Google Scholar

Download references

Acknowledgements

This work was partially supported by AIP Acceleration Research JPMJCR23U2, JST, and JSPS KAKENHI Grant Number 24K14961.

Author information

Authors and Affiliations

Osaka University, Osaka, Japan
Yuchen Ji, Daichi Amagata, Yuya Sasaki & Takahiro Hara

Authors

Yuchen Ji
View author publications
You can also search for this author in PubMed Google Scholar
Daichi Amagata
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Hara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuchen Ji or Daichi Amagata .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa
National Research Council (CNR), Rende, Italy
Giuseppe Manco
Johannes Kepler University Linz, Linz, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, Y., Amagata, D., Sasaki, Y., Hara, T. (2024). SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data. In: Strauss, C., Amagasa, T., Manco, G., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2024. Lecture Notes in Computer Science, vol 14911. Springer, Cham. https://doi.org/10.1007/978-3-031-68312-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-68312-1_16
Published: 17 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68311-4
Online ISBN: 978-3-031-68312-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data