Abstract
In this paper we study the problem of how to perform spatial joins between two data sets with no pre-computed spatial indices. No techniques appear to exist to date that specifically target this problem. Our solution is also useful in the context of query optimization for complex spatial queries. In addition, we demonstrate that simple sampling techniques can be effective in reducing spatial join costs.
We extend the work in [LR94, LR95] and introduce the bootstrap-seeding technique, which allows seeded trees to be constructed directly from input data sets. We can thus dynamically construct two seeded trees for two data sets and perform a spatial join between them. The task of bootstrap-seeding comprises the subtasks of determining the number and the contents of the slots, and constructing the tree. Simple sampling techniques are used to determine the slot contents efficiently.
Our experiments show that spatial joins using our methods are very comparable in performance to that of joins between the same data sets with pre-computed R-trees, and confirm the viability of our method. When joining two data sets with different sizes, our studies suggest that it would be beneficial to bootstrap an initial seeded tree for the smaller data set, and then to construct a seeded tree for the larger data set using copy-seeding and the seed level filtering technique.
This work was supported in part by the Consortium for International Earth Science Information Networking.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. H. Ahrens. Sequential random sampling. ACM Transactions on Mathematical Software, 11(2):157–169, June 1985.
Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using R-trees. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 237–246, May 1993.
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An efficient and robust access method for points and rectangles. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 322–332, May 1990.
A. F. Cardenas. Analysis and performance of inverted data base structures. Communications of ACM, 18(5):253–263, May 1975.
Brian Everitt. Cluster Analysis. Edward Arnold, London, third edition edition, 1993.
Christos Faloutsos, Timos Sellis, and Nick Roussopoulos. Analysis of object oriented spatial access methods. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 427–439, 1987.
Antonin Guttman. R-trees: A dynamic index structure for spatial searching. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 47–57, Aug. 1984.
Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data, An Introduction to Cluster Analysis. John Wiley & Sons, Inc., New York, 1990.
Wei Lu and Jiawei Han. Distance-associated join indices for spatial range search. In Proceedings of International Conference on Data Engineering, pages 284–292, 1992.
Ming-Ling Lo and C. V. Ravishankar. Spatial joins using seeded trees. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 209–220, Minneapolis, MN, May 1994.
Ming-Ling Lo and C. V. Ravishankar. Seeded trees for spatial joins: Structure and implementation. Technical report, Department of EECS, University of Michigan, Ann Arbor, Michigan, 1995.
J. A. Orenstein. Redundancy in spatial databases. In Proceedings of ACM SIGMOD International Conference on Management of Data, Portland, OR, 1989.
Jack Orenstein. A comparison of spatial query processing techniques for native and parameter spaces. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 343–352, 1990.
Jack Orenstein. An algorithm for computing the overlay of k-dimensional spaces. In O. Gunther and H.-J Schek, editors, Advances in Spatial Databases (SSD '91), pages 381–400, Zurich, Switzerland, August 28–30 1991. Springer-Verlag.
D Rotem. Spatial join indices. In Proceedings of International Conference on Data Engineering, pages 500–509, Kobe, Japan 1991.
T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of Very Large Data Bases, pages 3–11, Brighton, England, 1987.
P. Valduriez. Join indices. ACM Transactions on Database Systems, 12(2), 1987.
Jeffery Scott Vitter. Faster methods for random sampling. Communications of the ACM, 27(7):703–718, July 1984.
J. S. Vitter. Random sampling with reservoir. ACM Transactions on Mathematical Software, 11:37–57, March 1985.
S. B. Yao. Approximating block access in database organizations. Comm. of ACM, 20:260–261, Apr. 1977
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lo, ML., Ravishankar, C.V. (1995). Generating seeded trees from data sets. In: Egenhofer, M.J., Herring, J.R. (eds) Advances in Spatial Databases. SSD 1995. Lecture Notes in Computer Science, vol 951. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60159-7_20
Download citation
DOI: https://doi.org/10.1007/3-540-60159-7_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60159-3
Online ISBN: 978-3-540-49536-9
eBook Packages: Springer Book Archive