AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles

Estivill-Castro, Vladimir; Lee, Ickjai

doi:10.1007/3-540-45244-3_11

AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles

Vladimir Estivill-Castro³ &
Ickjai Lee³

Conference paper
First Online: 01 January 2001

1116 Accesses
23 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2007))

Abstract

Wide spread clustering algorithms use the Euclidean distance to measure spatial proximity. However, obstacles in other GIS data-layers prevent traversing the straight path between two points. AUTOCLUST+ clusters points in the presence of obstacles based on Voronoi modeling and Delaunay Diagrams. The algorithm is free of user-supplied arguments and incorporates global and local variations. Thus, it detects high-quality clusters (clusters of arbitrary shapes, clusters of different densities, sparse clusters adjacent to high-density clusters, multiple bridges between clusters and closely located high-density clusters) without prior knowledge. Consequently, it successfully supports correlation analyses between layers (requiring high-quality clusters) and more general locational optimization problems in the presence of obstacles. All this within O(n log n+[m+R] log n) expected time, where n is the number of data points, m is the number of line-segments that determine the obstacles and R is the number of Delaunay edges intersecting some obstacles. A series of detailed performance evaluations illustrates the power of AUTOCLUST+ and confirms the virtues of our approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. D. Banfield and A. E. Raftery. Model-based Gaussian and Non-Gaussian Clustering. Biometrics, 49:803–821, 1993.
Article MATH MathSciNet Google Scholar
C. Eldershaw and M. Hegland. Cluster Analysis using Triangulation. In B. J. Noye, M. D. Teubner, and A. W. Gill, editors, Computational Techniques and Applications: CTAC97, pages 201–208. World Scientific, Singapore, 1997.
Google Scholar
M. Ester, M. P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996.
Google Scholar
V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
Google Scholar
V. Estivill-Castro and M. E. Houle. Robust Clustering of Large Geo-referenced Data Sets. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 327–337, 1999.
Google Scholar
V. Estivill-Castro and I. Lee. AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In Proceedings of the 9th International Symposium on Spatial Data Handling, 2000. to appear.
Google Scholar
V. Estivill-Castro and I. Lee. AUTOCLUST: Automatic Clustering via Boundary Extraction for Massive Point-data Sets. In Proceedings of the 5th International Conference on Geocomputation, 2000. to appear. Extended version is available at http://www.cs.newcastle.edu.au/Dept/techrep.html as a technical report.
V. Estivill-Castro and A.T. Murray. Discovering associations in spatial data-An efficient medoid based approach. In X. Wu, R. Kotagiri, and K.K. Korb, editors, Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), pages 110–121, Melbourne, Australia, 1998. Springer-Verlag Lecture Notes in Artificial Intelligence 1394.
Google Scholar
C. Fraley and A. E. Raftery. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, 41(8):578–588, 1998.
Article MATH Google Scholar
C. M. Gold. Problems with handling spatial data-The Voronoi approach. CISM Journal ACSGC, 45(1):65–80, 1991.
Google Scholar
C. M. Gold. The meaning of Neighbour. In G. Tinhofer and G. Schmidt, editors, Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, pages 220–235, Berlin, 1992. Springer-Verlag Lecture Notes in Computer Science 639.
Google Scholar
M. F. Goodchild. Geographical information science. International Journal of Geographical Information Systems, 6(1):31–45, 1992.
Article Google Scholar
S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 73–84, 1998.
Google Scholar
I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35–39, 1997.
Google Scholar
G. Karypis, E. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer: Special Issue on Data Analysis and Mining, 32(8):68–75, 1999.
Google Scholar
M. Keil and G Gutwin. The Delauney triangulation closely approximates the complete graph. In F. Denhe, J.-R. Sack, and N. Snatoro, editors, Proceedings of the First Workshop on Algorithms and Data Structures WADS-89, pages 47–56. Springer-Verlag Lecture Notes in Computer Science 382, 1989.
Google Scholar
E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Minings. IEEE Transactions on Knowledge and Data Engineering, 8(6):884–897, 1996.
Article Google Scholar
E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Boundary Shape Matching Relationships in Spatial Data. In Proceedings of the 5th International Symposium on Spatial Databases, pages 29–46, Berlin, 1997. Springer-Verlag Lecture Notes in Computer Science 1262.
Google Scholar
G. Liotta. Low Degree Algorithm for Computing and Checking Gabriel Graphs. Technical Report 96-28, Department of Computer Science, Brown University, 1996.
Google Scholar
K. Mehlhorn and S. Näher. LEDA A platform for combinatorial and geometric computing. Cambridge University Press, Cambridge, 1999.
MATH Google Scholar
R. T. Ng and J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 144–155, 1994.
Google Scholar
A. Okabe, B. N. Boots, and K. Sugihara. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley & Sons, West Sussex, 1992.
MATH Google Scholar
S. Openshaw. A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets. International Journal of GIS, 1(4):335–358, 1987.
Google Scholar
S. Openshaw. Two exploratory space-time-attribute pattern analysers relevant to GIS. In S. Fotheringham and P. Rogerson, editors, Spatial Analysis and GIS, pages 83–104. Taylor and Francis, London, 1994.
Google Scholar
J. F. Paper and D. J. Maguire. Design models and functionality in gis. Computers and Geosciences, 18(4):387–394, 1992.
Article Google Scholar
C. Posse. Hierarchical Model-Based Clustering for Large Datasets. Technical Report 363, Department of Statistics, University of Washington, 1999.
Google Scholar
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, NY, 1987.
MATH Google Scholar
E. Schikuta and M. Erhart. The BANG-Clustering System: Grid-Based Data Analysis. In X. Liu, P. Cohen, and M. Berthold, editors, Proceedings of the Second International Symposium IDA-97, pages 513–524, Berlin, 1997. Advances in Intelligent Data Analysis, Springer-Verlag Lecture Notes in Computer Science 1280.
Google Scholar
E. Son, I. Kang, and K. Li. A Spatial Data Mining Method by Clustering Analysis. In ACM-GIS 1998, pages 157–158, 1998.
Google Scholar
A. K. H. Tung, J. Hou, and J. Han. COE: Clustering with Obstacles Entities, A Preliminary Study. In Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.
Google Scholar
W. Wang, J. Yang, and R. Muntz. STING+: An Approach to Active Spatial Data Mining. In Proceedings of the International Conference on Data Engineering, pages 116–125, 1999.
Google Scholar
C. T. Zahn. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions of Computers, 20(1):68–86, 1971.
Article MATH Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 103–114, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Software Engineering, The University of Newcastle, NSW 2308, Callaghan, Australia
Vladimir Estivill-Castro & Ickjai Lee

Authors

Vladimir Estivill-Castro
View author publications
You can also search for this author in PubMed Google Scholar
Ickjai Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Informatics and Engineering, Flinders University of South Australia, P.O. Box 2100, 5001, Adelaide, South Australia
John F. Roddick
National Center for Geographic Information and Analysis, University of Maine, ME 04469-5711, Maine, Orono, USA
Kathleen Hornsby

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Estivill-Castro, V., Lee, I. (2001). AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_11

Download citation

DOI: https://doi.org/10.1007/3-540-45244-3_11
Published: 16 March 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics