Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering

Estivill-Castro, Vladimir; Houle, Michael E.

doi:10.1007/3-540-44794-6_8

Vladimir Estivill-Castro³ &
Michael E. Houle⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2168))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

2641 Accesses
3 Citations

Abstract

Statistical principles suggest minimization of the total within-group distance (TWGD) as a robust criterion for clustering point data associated with a Geographical Information System [17]. This NP-hard problem must essentially be solved using heuristic methods, although admitting a linear programming formulation. Heuristics proposed so far require quadratic time, which is prohibitively expensive for data mining applications. This paper introduces data structures for the management of large bi-dimensional point data sets and for fast clustering via interchange heuristics. These structures avoid the need for quadratic time through approximations to proximity information. Our scheme is illustrated with two-dimensional quadtrees, but can be extended to use other structures suited to three dimensional data or spatial data with time-stamps. As a result, we obtain a fast and robust clustering method.

Download to read the full chapter text

Chapter PDF

ClustGeo: an R package for hierarchical clustering with spatial constraints

Article 20 January 2018

Efficient strategies for spatial data clustering using topological relations

Article 23 December 2024

Detecting Clustering Scales with the Incremental K-Function: Comparison Tests on Actual and Simulated Geospatial Datasets

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

J. Barnes and P. Hut. A hierarchical O(n log n) force-calculation algorithm. Nature, 324:446–449, 1986.
Article Google Scholar
L. Belbin. The use of non-hierarchical allocation methods for clustering large sets of data. Australian Comp. J., 19:32–41, 1987.
Google Scholar
R. L. Bowerman, P. H. Calamai, and G. B. Hal. The demand partitioning method for reducing aggregation errors in p-median problems. Computers & Operations Research, 26:1097–1111, 1999.
Article MATH MathSciNet Google Scholar
P. S. Bradley, O. L. Mangasarian, and W. N. Street. Clustering via concave minimization. Advances in neural information processing systems, 9:368-, 1997.
Google Scholar
J. Carrier, L. Greengard, and V. Rokhlin. A fast adaptive multipode algorithm for particle simulation. SIAMJ. Science and Statistical Computing, 9:669–686, 1988.
Article MATH MathSciNet Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likehood from incomplete data via the EM algorithm. J. Royal Statistical Soc. B, 39:1–38, 1977.
MATH MathSciNet Google Scholar
R. Duda & P. Hart. Pattern Classification and Scene Analysis. Wiley, US, 1973.
Google Scholar
E. Erkut and B. Bozkaya. Analysis of aggregation error for the p-median problem. Computers & Operations Research, 26:1075–1096, 1999.
Article MATH MathSciNet Google Scholar
M. Ester, H. P. Kriegel, and X. Xu. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. SDD-95, 70–82, 1995. Springer-Verlag LNCS 951.
Google Scholar
V. Estivill-Castro and M. E. Houle. Robust clustering of large geo-referenced data sets. PAKDD-99, 327–337. Springer-Verlag LNAI 1574, 1999.
Google Scholar
V. Estivill-Castro and M.E. Houle. Fast randomized algorithms for robust estimation of location. Proc. Int. Workshop on Mining Spatial and Temporal Data (with PAKDD-2001), Hong Kong, 2001.
Google Scholar
V. Estivill-Castro and M. E. Houle. Spatio-temporal data structures for minimization of total within-group distance. T. Rep. 2001-05, Dep. of CS & SE, U. of Newcastle. http://www.cs.newcastle.edu.au/Dept/techrep.html
V. Estivill-Castro and J. Yang. A fast and robust general purpose clustering algorithm. PRICAI 2000, 208–218, 2000. Springer-Verlag LNAI 1886.
Google Scholar
U. Fayyad, C. Reina, and P. S. Bradley. Initialization of iterative refinement clustering algorithms. 4th KDD 194–198. AAAI Press, 1998.
Google Scholar
T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force-directed placement. Software Practice and Experience, 21:1129–1164, 1991.
Article Google Scholar
M. Horn. Analysis and computation schemes for p-median heuristics. Environment and Planning A, 28:1699–1708, 1996.
Article Google Scholar
A. T. Murray. Spatial characteristics and comparisons of interaction and median clustering models. Geographical Analysis, 32:1-, 2000.
Article MATH Google Scholar
A. T. Murray and R. L. Church. Applying simulated annealing to location-planning models. J. of Heuristics, 2:31–53, 1996.
Article Google Scholar
R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. 20th VLDB, 144–155, 1994. Morgan Kaufmann.
Google Scholar
J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised learning using MML. 13th ML Conf., 364–372, 1996. Morgan Kaufmann.
Google Scholar
S. Openshaw. Two exploratory space-time-attribute pattern analysers relevant to GIS. In Spatial Analysis and GIS, 83–104, UK, 1994. Taylor and Francis.
Google Scholar
A. J. Quigley and P. Eades. FADE: Graph drawing, clustering and visual abstraction. 8th Symp. on Graph Drawing, 2000. Springer Verlag LNCS 1984.
Google Scholar
M. Rao. Cluster analysis and mathematical programming. J. Amer. Statistical Assoc., 66:622–626, 1971.
Article MATH Google Scholar
K. Rosing and C. ReVelle. Optimal clustering. Environment and Planning A, 18:1463–1476, 1986.
Article Google Scholar
P. J. Rousseeuw and A. M. Leroy. Robust regression and outlier detection. Wiley, USA, 1987.
MATH Google Scholar
H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, MA, 1989.
Google Scholar
E. Schikuta and M. Erhart. The BANG-clustering system: Grid-based data analysis. IDA-97. Springer-Verlag LNCS 1280, 1997.
Google Scholar
M. B. Teitz and P. Bart. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16:955–961, 1968.
Article MATH Google Scholar
H. Vinod. Integer programming and the theory of grouping. J. Am. Statistical Assoc., 64:506–517, 1969.
Article MATH Google Scholar
W. Wang, J. Yang, and R. Muntz. STING: A statistical information grid approach to spatial data mining. 23rd VLDB, 186–195, 1997. Morgan Kaufmann.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH:an efficient data clustering method for very large databases. SIGMOD Record, 25:103–114, 1996.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Software Engineering, The University of Newcastle, Callaghan, NSW, 2308, Australia
Vladimir Estivill-Castro
Basser Department of Computer Science, The University of Sydney, Sydney, NSW, 2006, Australia
Michael E. Houle

Authors

Vladimir Estivill-Castro
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Houle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Inst.of Information and Computing Sciences Dept. of Mathematics and Computer Science, University of Utrecht, Padualaan 14, de Uithof, 3508, TB Utrecht, The Netherlands
Arno Siebes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Estivill-Castro, V., Houle, M.E. (2001). Data Structures for Minimization of Total Within-Group Distance for Spatio-temporal Clustering. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_8

Download citation

DOI: https://doi.org/10.1007/3-540-44794-6_8
Published: 28 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics