Skip to main content

Medoid Queries in Large Spatial Databases

  • Conference paper
Advances in Spatial and Temporal Databases (SSTD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3633))

Included in the following conference series:

  • 2030 Accesses

  • 13 Citations

Abstract

Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).

Supported by grant HKUST 6180/03E from Hong Kong RGC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)

    Google Scholar 

  2. Theodoridis, Y., Stefanakis, E., Sellis, T.K.: Efficient cost models for spatial queries using r-trees. IEEE TKDE 12, 19–32 (2000)

    Google Scholar 

  3. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)

    Google Scholar 

  4. Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM TODS 24, 265–318 (1999)

    Article  Google Scholar 

  5. Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: STOC, pp. 106–113 (1998)

    Google Scholar 

  6. Kaufman, L., Rousseeuw, P.: Finding groups in data. Wiley-Interscience, Hoboken (1990)

    Book  Google Scholar 

  7. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, pp. 144–155 (1994)

    Google Scholar 

  8. Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD, pp. 94–99 (1995)

    Google Scholar 

  9. Ester, M., Kriegel, H.P., Xu, X.: Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 67–82. Springer, Heidelberg (1995)

    Google Scholar 

  10. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  11. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: SIGMOD, pp. 103–114 (1996)

    Google Scholar 

  12. Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: SIGMOD, pp. 73–84 (1998)

    Google Scholar 

  13. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)

    Google Scholar 

  14. Hartigan, J.A.: Clustering algorithms. Wiley, Chichester (1975)

    MATH  Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  16. Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: KDD, pp. 277–281 (1999)

    Google Scholar 

  17. Pelleg, D., Moore, A.W.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)

    Google Scholar 

  18. Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)

    Google Scholar 

  19. Fayyad, U., Piatetsjy-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. AAAI/MIT (1996)

    Google Scholar 

  20. Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM, pp. 490–499 (1993)

    Google Scholar 

  21. Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13, 124–141 (2001)

    Google Scholar 

  22. Lo, M.L., Ravishankar, C.V.: Generating seeded trees from data sets. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 328–347. Springer, Heidelberg (1995)

    Google Scholar 

  23. Lo, M.L., Ravishankar, C.V.: The design and implementation of seeded trees: An efficient method for spatial joins. IEEE TKDE 10, 136–152 (1998)

    Google Scholar 

  24. Mamoulis, N., Papadias, D.: Slot index spatial join. IEEE TKDE 15, 211–231 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mouratidis, K., Papadias, D., Papadimitriou, S. (2005). Medoid Queries in Large Spatial Databases. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds) Advances in Spatial and Temporal Databases. SSTD 2005. Lecture Notes in Computer Science, vol 3633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11535331_4

Download citation

  • DOI: https://doi.org/10.1007/11535331_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28127-6

  • Online ISBN: 978-3-540-31904-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics