Skip to main content
Log in

Data Partitioning for Parallel Spatial Join Processing

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. In this paper we show that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D.J. Abel, V. Gaede, R.A. Power, and X. Zhou. “Resequencing and clustering to improve the performance of spatial join,” Technical report, CSIRO Mathematical and Information Sciences, Australia, 1997.

    Google Scholar 

  2. D.J. Abel, B.C. Ooi, K.-L. Tan, R. Power, and J.X. Yu. “Spatial join strategies in distributed spatial dbms,” in LNCS 951: Proc. of 4th Int. Symp. on Large Spatial Databases (SSD'95), 346–367, Springer-Verlag, 1995.

  3. D.J. Abel and J.L. Smith. “A data structure and algorithm based on a linear key for a rectangle retrieval problem,” Computer Vision, Graphics and Image Processing, 24(1):1–13, 1983.

    Google Scholar 

  4. T. Brinkhoff, H.P. Kriegel, and B. Seeger. “Efficient processing of spatial joins using R-trees,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 237–246, 1993.

  5. T. Brinkhoff, H.P. Kriegel, and B. Seeger. “Parallel processing of spatial joins using R-trees,” in Proc. of 12th International Conference on Data Engineering, 1996.

  6. D.J. DeWitt and J. Gray. “Parallel database systems: the future of database processing,” C. ACM, 35(6):85–98, 1992.

    Google Scholar 

  7. D.J. DeWitt et al. “Practical skew handling in parallel join,” in Proc. 18th Int. Conf. on Very Large Data Bases, Vancouver, Canada, 27–40, 1992.

  8. O. Günther. “Efficient computation of spatial joins,” in Proc. of 9th International Conference on Data Engineering, Vienna, Austria, 50–59, 1993.

  9. R.H. Güting. “An introduction to spatial database systems,” VLDB Journal, 3(4):357–399, 1994.

    Google Scholar 

  10. A. Guttman. “R-trees: A dynamic index structure for spatial searching,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 47–54, 1984.

  11. E.G. Hoel and H. Samet. “Performance of data-parallel spatial operations,” in Proc. 20th Int. Conf. on Very Large Data Bases, 156–167, 1995.

  12. E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms, Computer Science Press, 1978.

  13. K.A. Hua and C. Lee. “Handing data skew in multiprocessor database computers using partition tuning,” in Proc. of 17th International Conference on Very Large Data Bases, Barcelona, 523–535, 1991.

  14. H. Ishihata, T. Horie, S. Inano, T. Shimizu, and S. Kato. “CAP-IID architecture,” in Proc. of the 1st Fujitsu-ANU CAP Workshop, Kawasaki, Japan, 1990.

  15. M. Kitsuregawa and Y. Ogawa. “Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (SDC),” in Proc. 16th Int. Conf. on Very Large Data Bases, 210–221, 1990.

  16. M. Kitsuregawa, H. Tanaka, and T. Motooka. “Application of hash to database machine and its architecture,” New Generation Computing, 1(1):66–74, 1983.

    Google Scholar 

  17. M.L. Lo and C.V. Ravishankar. “Spatial joins using seeded trees,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, 209–220, 1994.

  18. M.L. Lo and C.V. Ravishankar. “Spatial hash-join,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, 247–258, 1996.

  19. J.H. Lu, B.C. Ooi, and K.L. Tan. Query Processing in Parallel Relational Database Systems, IEEE Computer Society Press, 1994.

  20. J. Orenstein and F.A. Manola. “Probe spatial data modeling and query processing in an image database application,” IEEE TOSE, 14(5):611–629, 1988.

    Google Scholar 

  21. J.M. Patel and D.J. DeWitt. “Partition based spatial-merge join,” in Proc. ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, 259–270, 1996.

  22. F.P. Preparata and M.I. Shamos. Computational Geometry: an introduction. Springer-Verlag, 1985.

  23. T. Sellis, N. Roussopoulos, and C. Faloutsos. “The R+-tree: a dynamic index for multi-dimensional objects, ” in Proc. 13th Int. Conf. on Very Large Data Bases, 3–11, 1987.

  24. M. Stonebraker, J. Frew, K. Gardels, and J. Meredith. “The SEQUOIA 2000 storage benchmark,” in Proc. of ACM SIGMOD Int. Conf. on Management of Data, Washington, DC, 2–11, 1993.

  25. X. Zhou. Parallel Processing in Relational Database Systems, Ph.D. thesis, University of Queensland, 1994.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, X., Abel, D.J. & Truffet, D. Data Partitioning for Parallel Spatial Join Processing. GeoInformatica 2, 175–204 (1998). https://doi.org/10.1023/A:1009755931056

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009755931056

Navigation