Medoid Queries in Large Spatial Databases

Mouratidis, Kyriakos; Papadias, Dimitris; Papadimitriou, Spiros

doi:10.1007/11535331_4

Kyriakos Mouratidis¹⁹,
Dimitris Papadias¹⁹ &
Spiros Papadimitriou²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3633))

Included in the following conference series:

International Symposium on Spatial and Temporal Databases

2030 Accesses
13 Citations

Abstract

Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).

Supported by grant HKUST 6180/03E from Hong Kong RGC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Analyzing and Enhancing Processing Speed of K-Medoid Algorithm Using Efficient Large Scale Processing Frameworks

Active Distance-Based Clustering Using K-Medoids

Improving the Efficiency of the K-medoids Clustering Algorithm by Getting Initial Medoids

References

Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)
Google Scholar
Theodoridis, Y., Stefanakis, E., Sellis, T.K.: Efficient cost models for spatial queries using r-trees. IEEE TKDE 12, 19–32 (2000)
Google Scholar
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)
Google Scholar
Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM TODS 24, 265–318 (1999)
Article Google Scholar
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: STOC, pp. 106–113 (1998)
Google Scholar
Kaufman, L., Rousseeuw, P.: Finding groups in data. Wiley-Interscience, Hoboken (1990)
Book Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, pp. 144–155 (1994)
Google Scholar
Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD, pp. 94–99 (1995)
Google Scholar
Ester, M., Kriegel, H.P., Xu, X.: Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 67–82. Springer, Heidelberg (1995)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: SIGMOD, pp. 103–114 (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: SIGMOD, pp. 73–84 (1998)
Google Scholar
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)
Google Scholar
Hartigan, J.A.: Clustering algorithms. Wiley, Chichester (1975)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
MATH Google Scholar
Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: KDD, pp. 277–281 (1999)
Google Scholar
Pelleg, D., Moore, A.W.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)
Google Scholar
Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)
Google Scholar
Fayyad, U., Piatetsjy-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. AAAI/MIT (1996)
Google Scholar
Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM, pp. 490–499 (1993)
Google Scholar
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13, 124–141 (2001)
Google Scholar
Lo, M.L., Ravishankar, C.V.: Generating seeded trees from data sets. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 328–347. Springer, Heidelberg (1995)
Google Scholar
Lo, M.L., Ravishankar, C.V.: The design and implementation of seeded trees: An efficient method for spatial joins. IEEE TKDE 10, 136–152 (1998)
Google Scholar
Mamoulis, N., Papadias, D.: Slot index spatial join. IEEE TKDE 15, 211–231 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Kyriakos Mouratidis & Dimitris Papadias
Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Spiros Papadimitriou

Authors

Kyriakos Mouratidis
View author publications
Search author on:PubMed Google Scholar
Dimitris Papadias
View author publications
Search author on:PubMed Google Scholar
Spiros Papadimitriou
View author publications
Search author on:PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing, CP 6176, University of Campinas, 13084-971, Campinas, Brazil
Claudia Bauzer Medeiros
National Center for Geographic Information and Analysis and Department of Spatial Information Science and Engineering, University of Maine, Boardman Hall, ME 04469-5711, Orono, USA
Max J. Egenhofer
Purdue University,
Elisa Bertino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mouratidis, K., Papadias, D., Papadimitriou, S. (2005). Medoid Queries in Large Spatial Databases. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds) Advances in Spatial and Temporal Databases. SSTD 2005. Lecture Notes in Computer Science, vol 3633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11535331_4

Download citation

DOI: https://doi.org/10.1007/11535331_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28127-6
Online ISBN: 978-3-540-31904-7
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics