Abstract
Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).
Supported by grant HKUST 6180/03E from Hong Kong RGC.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: SIGMOD, pp. 322–331 (1990)
Theodoridis, Y., Stefanakis, E., Sellis, T.K.: Efficient cost models for spatial queries using r-trees. IEEE TKDE 12, 19–32 (2000)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD, pp. 71–79 (1995)
Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM TODS 24, 265–318 (1999)
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: STOC, pp. 106–113 (1998)
Kaufman, L., Rousseeuw, P.: Finding groups in data. Wiley-Interscience, Hoboken (1990)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: VLDB, pp. 144–155 (1994)
Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD, pp. 94–99 (1995)
Ester, M., Kriegel, H.P., Xu, X.: Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 67–82. Springer, Heidelberg (1995)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: SIGMOD, pp. 103–114 (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: SIGMOD, pp. 73–84 (1998)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)
Hartigan, J.A.: Clustering algorithms. Wiley, Chichester (1975)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: KDD, pp. 277–281 (1999)
Pelleg, D., Moore, A.W.: X-means: Extending K-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)
Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)
Fayyad, U., Piatetsjy-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. AAAI/MIT (1996)
Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM, pp. 490–499 (1993)
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE TKDE 13, 124–141 (2001)
Lo, M.L., Ravishankar, C.V.: Generating seeded trees from data sets. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 328–347. Springer, Heidelberg (1995)
Lo, M.L., Ravishankar, C.V.: The design and implementation of seeded trees: An efficient method for spatial joins. IEEE TKDE 10, 136–152 (1998)
Mamoulis, N., Papadias, D.: Slot index spatial join. IEEE TKDE 15, 211–231 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mouratidis, K., Papadias, D., Papadimitriou, S. (2005). Medoid Queries in Large Spatial Databases. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds) Advances in Spatial and Temporal Databases. SSTD 2005. Lecture Notes in Computer Science, vol 3633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11535331_4
Download citation
DOI: https://doi.org/10.1007/11535331_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28127-6
Online ISBN: 978-3-540-31904-7
eBook Packages: Computer ScienceComputer Science (R0)
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.