skip to main content
research-article

Spatial joins: what's next?

Published: 05 August 2019 Publication History

Abstract

The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. This paper reviews research and recent trends on spatial join evaluation. The complexity of different data types, the consideration of different join predicates, the use of modern commodity hardware, and support for parallel processing open the road to a number of interesting directions for future research, some of which we outline in the paper.

References

[1]
Magellan: Geospatial analytics using spark. https://github.com/harsha2010/magellan.
[2]
D. Aghajarian, S. Puri, and S. K. Prasad. GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In SIGSPATIAL, pages 18:1--18:10, 2016.
[3]
A. Aji, F. Wang, and J. H. Saltz. Towards building a high performance spatial query system for large scale medical imaging data. In SIGSPATIAL/GIS, pages 309--318, 2012.
[4]
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz. Hadoop-GIS: A high performance spatial data warehousing system over mapreduce. PVLDB, 6(11):1009--1020, 2013.
[5]
L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, and J. S. Vitter. Scalable sweeping-based spatial join. In VLDB, pages 570--581, 1998.
[6]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990.
[7]
P. Bouros, S. Ge, and N. Mamoulis. Spatio-textual similarity joins. PVLDB, 6(1):1--12, 2012.
[8]
T. Brinkhoff, H. Kriegel, R. Schneider, and B. Seeger. Multi-step processing of spatial joins. In SIGMOD, pages 197--208, 1994.
[9]
T. Brinkhoff, H. Kriegel, and B. Seeger. Parallel processing of spatial joins using R-trees. In ICDE, pages 258--265, 1996.
[10]
T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient processing of spatial joins using r-trees. In SIGMOD, pages 237--246, 1993.
[11]
H. Cao, N. Mamoulis, and D. W. Cheung. Discovery of periodic patterns in spatiotemporal sequences. IEEE Trans. Knowl. Data Eng., 19(4):453--467, 2007.
[12]
J. Dittrich and B. Seeger. Data redundancy and duplicate detection in spatial join processing. In ICDE, pages 535--546, 2000.
[13]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A mapreduce framework for spatial data. In ICDE, pages 1352--1363, 2015.
[14]
A. Eldawy and M. F. Mokbel. The era of big spatial data. PVLDB, 10(12):1992--1995, 2017.
[15]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996.
[16]
R. H. Güting. An introduction to spatial database systems. VLDB Journal, 3(4):357--399, 1994.
[17]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[18]
E. H. Jacox and H. Samet. Spatial join techniques. ACM Transactions on Database Systems, 32(1):7, 2007.
[19]
A. Kipf, H. Lang, V. Pandey, R. A. Persa, P. A. Boncz, T. Neumann, and A. Kemper. Adaptive geospatial joins for modern hardware. CoRR, abs/1802.09488, 2018.
[20]
N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD, pages 324--335, 1997.
[21]
S. T. Leutenegger, J. M. Edgington, and M. A. López. STR: A simple and efficient algorithm for r-tree packing. In ICDE, pages 497--506, 1997.
[22]
J. Liagouris, N. Mamoulis, P. Bouros, and M. Terrovitis. An effective encoding scheme for spatial RDF data. PVLDB, 7(12):1271--1282, 2014.
[23]
M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In SIGMOD, pages 247--258, 1996.
[24]
P. A. Longley, M. Goodchild, D. J. Maguire, and D. W. Rhind. Geographic Information Systems and Science. Wiley Publishing, 3rd edition, 2010.
[25]
N. Mamoulis. Spatial join. In L. Liu and M. T. Özsu, editors, Encyclopedia of Database Systems, pages 2707--2714. Springer US, 2009.
[26]
N. Mamoulis. Spatial Data Management. Morgan & Claypool Publishers, 2011.
[27]
H. Markram, K. Meier, T. Lippert, S. Grillner, R. S. Frackowiak, S. Dehaene, A. Knoll, H. Sompolinsky, K. Verstreken, J. DeFelipe, S. Grant, J. Changeux, and A. Saria. Introducing the human brain project. In FET, pages 39--42, 2011.
[28]
M. McKenney, R. Frye, M. Dellamano, K. Anderson, and J. Harris. Multi-core parallelism for plane sweep algorithms as a foundation for GIS operations. GeoInformatica, 21(1):151--174, 2017.
[29]
S. Nobari, Q. Qu, and C. S. Jensen. In-memory spatial join: The data matters! In EDBT, pages 462--465, 2017.
[30]
S. Nobari, F. Tauheed, T. Heinis, P. Karras, S. Bressan, and A. Ailamaki. TOUCH: in-memory spatial join by hierarchical data-oriented partitioning. In SIGMOD, pages 701--712, 2013.
[31]
V. Pandey, A. Kipf, T. Neumann, and A. Kemper. How good are modern spatial analytics systems? PVLDB, 11(11):1661--1673, 2018.
[32]
J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD, pages 259--270, 1996.
[33]
M. Pavlovic, T. Heinis, F. Tauheed, P. Karras, and A. Ailamaki. TRANSFORMERS: robust spatial joins on non-uniform data distributions. In ICDE, pages 673--684, 2016.
[34]
M. Pavlovic, F. Tauheed, T. Heinis, and A. Ailamaki. GIPSY: joining spatial datasets with contrasting density. In SSDBM, 2013.
[35]
D. Piatov, S. Helmer, and A. Dignös. An interval join optimized for modern hardware. In ICDE, 2016.
[36]
S. K. Prasad, M. McDermott, S. Puri, D. Shah, D. Aghajarian, S. Shekhar, and X. Zhou. A vision for gpu-accelerated parallel computation on geo-spatial datasets. SIGSPATIAL Special, 6(3):19--26, 2014.
[37]
F. P. Preparata and M. I. Shamos. Computational Geometry - An Introduction. Springer, 1985.
[38]
S. Qi, P. Bouros, and N. Mamoulis. Efficient top-k spatial distance joins. In Advances in Spatial and Temporal Databases - 13th International Symposium, SSTD 2013, Munich, Germany, August 21--23, 2013. Proceedings, pages 1--18, 2013.
[39]
S. Ray, B. Simion, A. D. Brown, and R. Johnson. Skew-resistant parallel in-memory spatial join. In SSDBM, pages 6:1--6:12, 2014.
[40]
I. Sabek and M. F. Mokbel. On spatial joins in mapreduce. In SIGSPATIAL/GIS, 2017.
[41]
H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[42]
T. Seidl, S. Fries, and B. Boden. MR-DSJ: distance-based self-join for large-scale vector data analysis with mapreduce. In BTW, pages 37--56, 2013.
[43]
B. Sowell, M. A. V. Salles, T. Cao, A. J. Demers, and J. Gehrke. An experimental analysis of iterated spatial joins in main memory. PVLDB, 6(14):1882--1893, 2013.
[44]
M. Tang, Y. Yu, Q. M. Malluhi, M. Ouzzani, and W. G. Aref. Locationspark: A distributed in-memory data management system for big spatial data. PVLDB, 9(13):1565--1568, 2016.
[45]
F. Tauheed, T. Heinis, and A. Ailamaki. Configuring spatial grids for efficient main memory joins. In BICOD, 2015.
[46]
D. Xie, F. Li, B. Yao, G. Li, Z. Chen, L. Zhou, and M. Guo. Simba: spatial in-memory big data analysis. In SIGSPATIAL/GIS, pages 86:1--86:4, 2016.
[47]
S. You, J. Zhang, and L. Gruenwald. Large-scale spatial join query processing in cloud. In CloudDB, ICDE Workshops, pages 34--41, 2015.
[48]
J. Yu and M. Sarwat. Geospatial data management in apache spark: A tutorial. In ICDE, pages 2060--2063, 2019.
[49]
J. Yu, Z. Zhang, and M. Sarwat. Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica, 23(1):37--78, 2019.
[50]
E. T. Zacharatou, H. Doraiswamy, A. Ailamaki, C. T. Silva, and J. Freire. GPU rasterization for real-time spatial aggregation over arbitrary polygons. PVLDB, 11(3):352--365, 2017.
[51]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In HotCloud, 2010.
[52]
J. Zhang, N. Mamoulis, D. Papadias, and Y. Tao. All-nearest-neighbors queries in spatial databases. In SSDBM, pages 297--306, 2004.
[53]
S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. SJMR: parallelizing spatial join with mapreduce on clusters. In CLUSTER, pages 1--8, 2009.
[54]
X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. In SSD, pages 178--196, 1997.
[55]
G. Zimbrao and J. M. de Souza. A raster approximation for processing of spatial joins. In VLDB, pages 558--569, 1998.

Cited By

View all
  • (2025)PolyCard: A learned cardinality estimator for intersection queries on spatial polygonsJournal of Intelligent Information Systems10.1007/s10844-025-00921-zOnline publication date: 22-Jan-2025
  • (2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
  • (2024)FUDJ: Flexible User-Defined Distributed Joins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00320(4194-4207)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIGSPATIAL Special
SIGSPATIAL Special  Volume 11, Issue 1
March 2019
34 pages
EISSN:1946-7729
DOI:10.1145/3355491
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019
Published in SIGSPATIAL Volume 11, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)PolyCard: A learned cardinality estimator for intersection queries on spatial polygonsJournal of Intelligent Information Systems10.1007/s10844-025-00921-zOnline publication date: 22-Jan-2025
  • (2024)RayJoin: Fast and Precise Spatial JoinProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656610(124-136)Online publication date: 30-May-2024
  • (2024)FUDJ: Flexible User-Defined Distributed Joins2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00320(4194-4207)Online publication date: 13-May-2024
  • (2024)Construct and Query A Fine-Grained Geospatial Knowledge GraphData Science and Engineering10.1007/s41019-023-00237-49:2(152-176)Online publication date: 22-Jan-2024
  • (2024)An effective spatial join method for blockchain-based geospatial data using hierarchical quadrant spatial LSM+ treeThe Journal of Supercomputing10.1007/s11227-024-06134-580:12(17492-17523)Online publication date: 1-Aug-2024
  • (2023)GLIN: A (G)eneric (L)earned (In)dexing Mechanism for Complex GeometriesProceedings of the 11th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data10.1145/3615833.3628590(1-12)Online publication date: 13-Nov-2023
  • (2023)Spatial Index Structures for Modern Storage Devices: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324220735:9(9578-9597)Online publication date: 1-Sep-2023
  • (2023)Defining and designing spatial queries: the role of spatial relationshipsGeo-spatial Information Science10.1080/10095020.2022.216392427:6(1868-1892)Online publication date: 17-May-2023
  • (2022)A Novel Approach to Improve the Performance of the Database Storing Big Data with Time InformationBalkan Journal of Electrical and Computer Engineering10.17694/bajece.105907010:4(388-396)Online publication date: 19-Oct-2022
  • (2021)Efficient QoS-Aware Spatial Join Processing for Scalable NoSQL Storage FrameworksIEEE Transactions on Network and Service Management10.1109/TNSM.2020.303415018:2(2437-2449)Online publication date: Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media