Skip to main content

SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14911))

Included in the following conference series:

Abstract

Cardinality estimation for spatial queries plays an important role in query scheduling and optimization. Spatial datasets are fully dynamic, and this setting necessitates an update-friendly, low-latency, and accurate cardinality estimator. However, existing cardinality estimation methods suffer from time-consuming updates and/or inefficient estimation. This work proposes SAFE (Sampling-Assisted Fast learned cardinality Estimator), which is carefully designed for dynamic spatial data. We specifically develop a sampling strategy that uses a quad-tree-based data partitioning and extracts a small subset, to enable fast training of cardinality estimation models. In addition, we employ 2-tier regression models to approximate the spatial data distribution while achieving accurate and fast cardinality estimation. We furthermore provide an incremental model update strategy to avoid re-training all models from scratch when we receive updates. We conduct experiments on real and synthetic datasets. Their results demonstrate that SAFE (i) outperforms state-of-the-art cardinality estimation models and (ii) efficiently handles data updates while ensuring accurate and low-latency estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The order of the x- and y-dimensions is changeable.

  2. 2.

    It is also possible to use weights \(\{w_1, w_2, w_3\}\) for the criteria.

  3. 3.

    http://download.geofabrik.de/.

  4. 4.

    Recall Sect. 2, and, to update QuickSel’s model for the up-to-date dataset, we need to compute the results of all existing queries, which requires more than a few seconds even for hundreds of queries. This is too slow for dynamic data, thereby we followed the original strategy of QuickSel (i.e., its model was not updated). In addition, LHist does not support data updates and needs reconstruction (incurring at least a few seconds for each update), thus we did not use it as a competitor.

References

  1. Amagata, D., Hara, T.: Monitoring MaxRS in spatial data streams. In: EDBT, pp. 317–328 (2016)

    Google Scholar 

  2. Amagata, D., Hara, T.: Identifying the most interactive object in spatial databases. In: ICDE, pp. 1286–1297 (2019)

    Google Scholar 

  3. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  4. Eldawy, A., Mokbel, M.F.: The era of big spatial data. In: ICDE Workshops, pp. 42–49 (2015)

    Google Scholar 

  5. Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1–9 (1974)

    Article  Google Scholar 

  6. Han, Y., et al.: Cardinality estimation in DBMS: a comprehensive benchmark evaluation. PVLDB 752–765 (2021)

    Google Scholar 

  7. Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: SIGMOD, pp. 1035–1050 (2020)

    Google Scholar 

  8. Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: DeepDB: learn from data, not from queries! PVLDB 13(7), 992–1005 (2020)

    Google Scholar 

  9. Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P.A., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. In: CIDR (2019)

    Google Scholar 

  10. Lesh, F.H.: Multi-dimensional least-squares polynomial curve fitting. Commun. ACM 2(9), 29–30 (1959)

    Article  Google Scholar 

  11. Liu, Q., Shen, Y., Chen, L.: LHist: towards learning multi-dimensional histogram for massive spatial data. In: ICDE, pp. 1188–1199 (2021)

    Google Scholar 

  12. Meng, Z., Wu, P., Cong, G., Zhu, R., Ma, S.: Unsupervised selectivity estimation by integrating Gaussian mixture models and an autoregressive model. In: EDBT, pp. 2–247 (2022)

    Google Scholar 

  13. Moti, M.H., Simatis, P., Papadias, D.: Waffle: a workload-aware and query-sensitive framework for disk-based spatial indexing. PVLDB 16(4), 670–683 (2022)

    Google Scholar 

  14. Nishio, S., Amagata, D., Hara, T.: Lamps: location-aware moving top-k pub/sub. IEEE Trans. Knowl. Data Eng. 34(01), 352–364 (2022)

    Article  Google Scholar 

  15. Park, Y., Zhong, S., Mozafari, B.: QuickSel: quick selectivity learning with mixture models. In: SIGMOD, pp. 1017–1033 (2020)

    Google Scholar 

  16. Sun, L., Li, C., Ji, T., Chen, H.: MOSE: a monotonic selectivity estimator using learned CDF. IEEE Trans. Knowl. Data Eng. 35(3), 2823–2836 (2023)

    Google Scholar 

  17. Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: LocationSpark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)

    Google Scholar 

  18. Wang, J., Chai, C., Liu, J., Li, G.: Face: a normalizing flow based cardinality estimator. PVLDB 15(1), 72–84 (2021)

    Google Scholar 

  19. Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C.: Updatable learned index with precise positions. PVLDB 14(8), 1276–1288 (2021)

    Google Scholar 

  20. Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: SIGMOD, pp. 2009–2022 (2021)

    Google Scholar 

  21. Yang, Z., et al.: NeuroCard: one cardinality estimator for all tables. PVLDB 14(1), 61–73 (2020)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by AIP Acceleration Research JPMJCR23U2, JST, and JSPS KAKENHI Grant Number 24K14961.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuchen Ji or Daichi Amagata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ji, Y., Amagata, D., Sasaki, Y., Hara, T. (2024). SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data. In: Strauss, C., Amagasa, T., Manco, G., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2024. Lecture Notes in Computer Science, vol 14911. Springer, Cham. https://doi.org/10.1007/978-3-031-68312-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-68312-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-68311-4

  • Online ISBN: 978-3-031-68312-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics