Abstract
An increasing number of emerging web database applications deal with large georeferenced data sets. However, exploring these large data sets through spatial queries can be very time and resource intensive. The need for interactive spatial queries has arisen in many applications such as Geographic Information Systems (GIS) for efficient decision-support. In this paper, we propose a new interactive spatial query processing technique for GIS. We present a family of the Incremental Refining Spatial Join (IRSJ) algorithms that can be used to report incrementally refined running estimates for aggregate queries while simultaneously displaying the actual query result tuples of the data sets sampled so far. Our goal is to minimize the time until an acceptably accurate estimate of the query result is available (to users) measured by a confidence interval. Our approach enables more interactive data exploration and analysis. While similar work has been done in relational databases, to the best of our knowledge, this is the first work using this approach in GIS. We investigate and evaluate different sampling methodologies through extensive experimental performance comparisons. Experiments on both real and synthetic data show an order of magnitude response time improvement relative to the final answer obtained when using a full R-tree join. We also show the impact of different index structures on the performance of our algorithms using three known sampling methods.
Similar content being viewed by others
References
An N, Yang Z, Sivasubramaniam A (2001) Selectivity estimation for spatial joins. In: Proceedings of international conf. on data engineering (ICDE), pp 368–375
Anselin L (1992) Spatial data analysis with GIS: an introduction to application in the social sciences. In: Technical report 92-10, National Center for Geographic Information and Analysis, University of California at Santa Barbara
Aref WG, Samet H (1994) A cost model for query optimization using R-trees. In: Proceedings of workship advances in GIS
Bae WD, Alkobaisi S, Leutenegger ST (2006) An incremental refinining spatial join algorithm for estimating qeury results in GIS. In: Proceedings of international conf. on database and expert systems applications (DEXA), pp 935–944
Beckmann N, Kriegel H-P, Schneider R (1990) The R*-tree: an efficient and robust access methods for points and rectangles. In: Proceedings of ACM SIGMOD, pp 322–331
Belussi A, Faloutsos C (1998) Self-spacial join selectivity estimation using fractal concepts. ACM Trans Inf Sys 16(2):161–201
Brinkhoff T, Kriegel H, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: Proceedings of ACM SIGMOD, pp 127–246
Chen CM, Roussopoulos N (1994) Adaptive selectivity estimation using query feedback. In: Proceedings of ACM SIGMOD, pp 161–172
Das A, Gehrke J, Riedewald M (2004) Approximation techniques for spatial data. In: Proceedings of ACM SIGMOD, pp 695–706
Faloutsos C, Seeger B, Graina A, Traina C (2000) Spatial join selectivity using power laws. In: Proceedings of ACM SIGMOD, pp 177–188
Faloutsos C, Sellis T, Roussopoulos N (1987) Analysis of object oriented spatial access methods. In: Proceedings of ACM SIGMOD, pp 426–439
Ghilani CD, Wolf PR (2006) Adjustment computations: spatial data analysis. Wiley, New York
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp 45–57
Haas PJ, Swami AN (1995) Sampling-based selectivity estimation for joins using augmented frequent value statistics. In: Proceedings of international conf. on data engineering (ICDE), pp 522–531
Haas PJ, Naughton JF, Swami AN (1994) On the relative cost of sampling for join selectivity estimation. In: Proceedings of ACM PODS, pp 14–24
Harangsri JSB, Ngu A (1997) Selectivity estimation for joins using systematic sampling. In: Proceedings of international conf. on database and expert systems applications (DEXA), pp 384–389
Hass PJ (1997) Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of international conf. scientific and statistical databases management (SSDBM), pp 51–63
Hass PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of ACM SIGMOD, pp 287–298
Hellerstein JM, Hass PJ, Wang HJ (1997) Online aggregation. In: Proceedings of ACM SIGMOD, pp 171–182
Hellerstein JM, Avnur R, Raman V (2000) Informix under control: online query processing. Data Mining and Knowledge Discovery 12:281–314
Huang YW, Jing N, Rundensteiner EA (1997a) A cost model for estimating the performance of spatial joins using R-trees. In: Proceedings of international conf. on scientific and statistical databases management (SSDBM), pp 30–38
Huang YW, Jing N, Rundensteiner EA (1997b) Spatial join using R-tree: breadth-first traversal with global optimizations. In: Proceedings of VLDB, pp 396–405
Kamel I, Faloutsos C (1993) An packing R-trees. In: Proceedings of ACM CIKM, pp 490–499
Larson RR (1996) Geographic information retrieval and spatial browsing. GIS and Libraries, University of Illinois
De Floriani L, Puppo E, Magillo P (1999) Applications of computational geometry to geographic information systems. Handbook of computational geometry, chapter 7, pp 333–388
Leutenegger ST, Lopez MA, Edginton J (1997) STR: a simple and efficient algorithm for R-tree packing. In: Proceedings of international conf. on data engineering (ICDE), pp 497–506
Leutenegger ST, Lopez MA (1998) The effect of buffering on the performance of R-trees. IEEE Trans Knowl Data Eng 12(1):33–44
Lo ML, Ravishankar CV (1993) The design and implementation of seeded trees: an efficient method for spatial joins. IEEE Trans Knowl Data Eng 10(1):136–151
Medeiros CB, Pires F (1994) Databases for GIS. ACM SIDMOD Record 23(1):107–115
Olken F (1993) Random sampling from databases. Master’s thesis, University of California at Berkeley
Olken F, Rotem D (1986) Simple random sampling from relational databases. In: Proceedings of VLDB, pp 160–169
Pagel B-U, Six H-W, Widmayer P (1993) Towards an analysis of range query performance. In: Proceedings of ACM PODS
Papadias D, Mamoulis N, Theodoridis Y (1999) Processing and optimization of multiway spatial joins using R-trees. In: Proceedings of ACM PODS, pp 44–55
Rubinstein RY (1981) Simulation and the Monte Carlo method. Wiley, New York
Scheaffer RL, Mendenhall W, Ott RL (1995) Elementary survey sampling. Duxbury Press
Serfling RJ (2002) Basic statistics for business and economics. McGraw-Hill, New York
Seshadri S (1992) Probabilistic methods in query processing. Master’s thesis, University of Wisconsin
Sun C, Agrawal O, Abbadi AE (2002) Selectivity estimation for spatial joins with geometric selections. In: Proceedings of international conf. on extending database technology (EDBT), pp 609–626
Theodoridis Y (2000) Efficient cost models for spatial queries using R-trees. IEEE Trans Knowledge Data Eng 12(1):19–32
Theodoridis Y, Sellis T (1996) A model for the prediction of R-tree performance. In: Proceedings of ACM PODS, pp 161–171
Theodoridis Y, Stefanakis E, Sellis T (1998) Cost models for join queries in spatial databases. In: Proceedings of international conf. data engineering (ICDE), pp 476–483
USGS. USGS mineral resources on-line spatial data. http://tin.er.usgs.gov/
Vassilakopoulos M, Manolopoulos Y (1997) On sampling regional data. Data Knowl Eng 22:309–318
Acknowledgements
The following people provided helpful suggestions during early stages of this work: Dr. Sada Narayanappa, Brandon Haenlein, and Mohammed Albow. Help in statistics was graciously provided by Dr. Petr Vojtěchovský.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bae, W.D., Alkobaisi, S. & Leutenegger, S.T. IRSJ: incremental refining spatial joins for interactive queries in GIS. Geoinformatica 14, 507–543 (2010). https://doi.org/10.1007/s10707-009-0089-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-009-0089-0