Abstract
Spatial textual indexing techniques allow one to efficiently access and process large volume of geospatial data, and recent research efforts have demonstrated that learned indexes can lead to better performance in comparison to conventional indexes. In this paper, we present a learned spatial textual index designed to process spatial textual data efficiently. Specifically, our proposed index is constructed based on the idea of radix table, spline points, and inverted lists. Besides, Morton encoding was used to convert high-dimensional coordinates into one dimension. In order to handle data insertion, deletion, and update in real-time, a gap array is used to store the underlying data, and a space reallocation strategy in units of spline points is designed. Based on the index, we propose query processing algorithms to handle different spatial keyword queries efficiently. An optimizer using random forest regression model was also designed to obtain appropriate index parameters for minimizing query latency. We evaluate our proposed index with IR-tree, and the findings show that our index outperforms IR-tree in terms of construction time, index size, and query efficiency.
Similar content being viewed by others
Data Availability
Not Applicable.
Human and Animal Ethics
Not Applicable.
Notes
The Z-order curve is a space-filling curve which can project multi-dimensional data into one-dimensional space. It has nice locality properties, such that coordinates close to each other in multi-dimensional space also have close Z-coordinates.
All range queries are square-shaped with their sides ranging from 2km to 10km.
References
Aref, W. G., & Ilyas, I. F. (2001). Sp-gist: An extensible database index for supporting space partitioning trees. Journal of Intelligent Information Systems, 17(2–3), 215–240. https://doi.org/10.1023/A:1012809914301.
Beckmann, N., Kriegel, H., Schneider, R., et al. (1990). The R*-Tree: an efficient and robust access method for points and rectangles. In SIGMOD conference (pp. 322–331). ACM Press. https://doi.org/10.1145/93605.98741
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517. https://doi.org/10.1145/361002.361007.
Chen, G., Zhao, J., Gao, Y., et al. (2017). Time-aware Boolean spatial keyword queries. IEEE Transactions on Knowledge and Data Engineering, 29(11), 2601–2614. https://doi.org/10.1109/TKDE.2017.2742956.
Chen, L., Shang, S., Yang, C., et al. (2020). Spatial keyword search: a survey. GeoInformatica, 24(1), 85–106. https://doi.org/10.1007/s10707-019-00373-y.
Chen, L., Cong, G., Jensen, C. S., et al. (2013). Spatial keyword query processing: An experimental evaluation. Proceedings of the VLDB Endowment, 6(3), 217–228. https://doi.org/10.14778/2535569.2448955.
Choudhury, F. M., Culpepper, J. S., Sellis, T., et al. (2016). Maximizing bichromatic reverse spatial and textual k nearest neighbor queries. In Proceedings of the VLDB endowment (Vol. 9 pp. 456–467). VLDB Endowment. https://doi.org/10.14778/2904121.2904122
Cong, G., Jensen, C. S., & Wu, D. (2009). Efficient retrieval of the top-k most relevant spatial web objects. Proceedings of the VLDB Endowment, 2(1), 337–348. https://doi.org/10.14778/1687627.1687666.
Davitkova, A., Milchevski, E., & Michel, S. (2020). The ML-Index: A multidimensional, learned index for point, range, and nearest-neighbor queries. In EDBT (pp. 407–410). OpenProceedings.org. https://doi.org/10.5441/002/edbt.2020.44
Ding, J., Nathan, V., Alizadeh, M., et al. (2020). Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. Proceedings of the VLDB Endowment, 14(2), 74–86. https://doi.org/10.14778/3425879.3425880.
Faloutsos, C., Barber, R., Flickner, M., et al. (1994). Efficient and effective querying by image content. Journal Intelligent Information Systems, 3(3/4), 231–262. https://doi.org/10.1007/BF00962238.
Ferragina, P., & Vinciguerra, G. (2020). Learned data structures. In Recent trends in learning from data (pp. 5–41). Springer. https://doi.org/10.1007/978-3-030-43883-8_2
Ferragina, P., & Vinciguerra, G. (2020). The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 13(8), 1162–1175. https://doi.org/10.14778/3389133.3389135.
Finkel, R. A., & Bentley, J. L. (1974). Quad trees: A data structure for retrieval on composite keys. Acta Informatica, 4, 1–9. https://doi.org/10.1007/BF00288933.
Galakatos, A., Markovitch, M., Binnig, C., et al. (2019). FITing-Tree: A data-aware index structure. In SIGMOD conference (pp. 1189–1206). ACM. https://doi.org/10.1145/3299869.3319860
Gao, Y., Qin, X., Zheng, B., et al. (2014). Efficient reverse top-k Boolean spatial keyword queries on road networks. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1205–1218. https://doi.org/10.1109/TKDE.2014.2365820.
Gao, Y., Zhao, J., Zheng, B., et al. (2015). Efficient collective spatial keyword query processing on road networks. IEEE Transactions on Intelligent Transportation Systems, 17(2), 469–480. https://doi.org/10.1109/TITS.2015.2477837.
Guttman, A. (1984). R-Trees: A dynamic index structure for spatial searching. In SIGMOD conference (pp. 47–57). ACM Press. https://doi.org/10.1007/978-3-319-23519-6_1151-2
Khodaei, A., Shahabi, C., & Li, C. (2010). Hybrid indexing and seamless ranking of spatial and textual features of web documents. In DEXA (1), Lecture Notes in Computer Science (Vol. 6261 pp. 450–466). Springer. https://doi.org/10.1007/978-3-642-15364-8_37
Kipf, A., Marcus, R., van Renen, A., et al. (2020). RadixSpline: a single-pass learned index. In SIGMOD (pp. 5:1–5:5). ACM. https://doi.org/10.1145/3401071.3401659
Kraska, T., Beutel, A., Chi, E. H., et al. (2018). The case for learned index structures. In SIGMOD conference (pp. 489–504). ACM. https://doi.org/10.48550/arXiv.1712.01208
Li, P., Lu, H., Zheng, Q., et al. (2020). LISA: A learned index structure for spatial data. In SIGMOD conference (pp. 2119–2133). ACM. https://doi.org/10.1145/3318464.3389703
Li, G., Zhou, X., & Cao, L. (2021). Machine learning for databases. Proceedings of the VLDB Endowment, 14(12), 3190–3193. https://doi.org/10.14778/3476311.3476405.
Marcus, R., Kipf, A., van Renen, A., et al. (2020). Benchmarking learned indexes. Proceedings of the VLDB Endowment, 14(1), 1–13. https://doi.org/10.14778/3421424.3421425.
Morton, G. M. (1966). A computer oriented geodetic data base and a new technique in file sequencing. Technical report, International Business Machines Co, Ottawa, Canada. https://doi.org/10.1063/1.4930281
Mrozek, D., Socha, B., Kozielski, S., et al. (2016). An efficient and flexible scanning of databases of protein secondary structures - with the segment index and multithreaded alignment. Journal of Intelligent Information Systems, 46(1), 213–233. https://doi.org/10.1007/s10844-014-0353-0.
Nathan, V., Ding, J., Alizadeh, M., et al. (2020). Learning multi-dimensional indexes. In SIGMOD conference (pp. 985–1000). ACM. https://doi.org/10.1145/3318464.3380579
Nievergelt, J., Hinterberger, H., & Sevcik, K. C. (1984). The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1), 38–71. https://doi.org/10.1145/348.318586.
Pandey, V., van Renen, A., Kipf, A., et al. (2020). The case for learned spatial indexes. In Proceedings of the AIDB Workshop @VLDB. https://doi.org/10.48550/arXiv.2008.10349
Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., et al. (2011). Efficient processing of top-k spatial keyword queries. In International symposium on spatial and temporal databases, (pp. 205–222). Springer. https://doi.org/10.1007/978-3-642-22922-0_13
Sellis, T. K., Roussopoulos, N., & Faloutsos, C. (1987). The R+-Tree: A dynamic index for multi-dimensional objects. In VLDB (pp. 507–518). Morgan Kaufmann. http://hdl.handle.net/1903/4541
Vaid, S., Jones, C. B., Joho, H., et al. (2005). Spatio-textual indexing for geographical search on the web. In SSTD, Lecture Notes in Computer Science (Vol. 3633 pp. 218–235). Springer. https://doi.org/10.1007/11535331_13
Wang, H., Fu, X., Xu, J., et al. (2019). Learned index for spatial queries. In MDM (pp. 569–574). IEEE. https://doi.org/10.1109/MDM.2019.00121
Wu, D., Cong, G., & Jensen, C. S. (2012). A framework for efficient spatial web object retrieval. The VLDB Journal, 21(6), 797–822. https://doi.org/10.1007/s00778-012-0271-0.
Yang, Z., Chandramouli, B., Wang, C., et al. (2020). Qd-tree: Learning data layouts for big data analytics. In SIGMOD conference, (pp. 193–208). ACM. https://doi.org/10.1145/3318464.3389770
Yang, Z., Zheng, B., Tong, C., et al. (2021). Haste: A distributed system for hybrid and adaptive processing on streaming spatial-textual data. In Proceedings of CIKM (pp. 2363–2372). https://doi.org/10.1145/3459637.3482435
Yoon, J. P., Raghavan, V. V., Chakilam, V., et al. (2001). Bitcube: A three-dimensional bitmap indexing for XML documents. Journal of Intelligent Information Systems, 17(2–3), 241–254. https://doi.org/10.1023/A:1012861931139.
Zhao, J., Gao, Y., Chen, G., et al. (2017). Towards efficient framework for time-aware spatial keyword queries on road networks. ACM Transactions on Information Systems, 36(3), 1–48. https://doi.org/10.1145/3143802.
Zheng, T., Chen, G., Wang, X., et al. (2019). Real-time intelligent big data processing: technology, platform, and applications. Science China Information Sciences, 62(8), 1–12. https://doi.org/10.48550/arXiv:2111.11872.
Zhong, R., Li, G., Tan, K. L., et al. (2015). G-tree: An efficient and scalable index for spatial search on road networks. IEEE Transactions on Knowledge and Data Engineering, 27(8), 2175–2189. https://doi.org/10.1109/TKDE.2015.2399306.
Zhou, Y., Xie, X., Wang, C., et al. (2005). Hybrid index structures for location-based web search. In CIKM (pp. 155–162). ACM. https://doi.org/10.1145/1099554.1099584
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 62172179.
Funding
Not Applicable.
Author information
Authors and Affiliations
Contributions
X. Ding and Y. Zheng wrote the main manuscript text and figures, Raymond and H. Jin improved the paper. Z. Wang and Y. Zheng wrote the revision manuscript and report. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical Approval and Consent to participate
Not Applicable.
Consent for publication
We have reviewed the final version of the manuscript and approve it for publication. To the best of our knowledge and belief, neither the entire paper nor any part of its content has been published or has been accepted for publication elsewhere.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, X., Zheng, Y., Wang, Z. et al. A learned spatial textual index for efficient keyword queries. J Intell Inf Syst 60, 803–827 (2023). https://doi.org/10.1007/s10844-022-00752-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-022-00752-2