Abstract
Representative-based clustering algorithms are quite popular due to their relative high speed and because of their sound theoretical foundation. On the other hand, the clusters they can obtain are limited to convex shapes and clustering results are also highly sensitive to initializations. In this paper, a novel agglomerative clustering algorithm called MOSAIC is proposed which greedily merges neighboring clusters maximizing a given fitness function. MOSAIC uses Gabriel graphs to determine which clusters are neighboring and approximates non-convex shapes as the unions of small clusters that have been computed using a representative-based clustering algorithm. The experimental results show that this technique leads to clusters of higher quality compared to running a representative clustering algorithm stand-alone. Given a suitable fitness function, MOSAIC is able to detect arbitrary shape clusters. In addition, MOSAIC is capable of dealing with high dimensional data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jiang, B.: Spatial Clustering for Mining Knowledge in Support of Generalization Processes in GIS. In: ICA Workshop on Generalisation and Multiple representation (2004)
Tan, M., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Reading (2005)
Choo, J.: Using Proximity Graphs to Enhance Representative-based Clustering Algorithms. Master Thesis, Department of Computer Science, University of Houston, TX (2007)
Gabriel, K., Sokal, R.: A New Statistical Approach to Geographic Variation Analysis. Systematic Zoology 18, 259–278 (1969)
Toussaint, G.: The Relative Neighborhood Graph of A Finite Planar Set. In: Int. Conf. Pattern Recognition, vol. 12, pp. 261–268 (1980)
Kirkpatrick, D.: A note on Delaunay and Optimal Triangulations. Information Processing Letters 10, 127–128 (1980)
Okabe, A., Boots, B., Sugihara, K.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley, New York (1992)
Bhattacharya, B., Poulsen, R., Toussaint, G.: Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule. In: Int. Sym. on Information Theory (1981)
Asano, T., Imai, H., Ibaraki, T., Nishizeki, T.: SIGAL 1990. LNCS, vol. 450, pp. 70–71. Springer, Heidelberg (1990)
Rousseeuw, P.J., Silhouettes, A.: Graphical Aid to The Interpretation and Validation of Cluster Analysis. Int. J. Computational and Applied Mathematics 20, 53–65 (1987)
Data Mining and Machine Learning Group website, University of Houston, Texas, http://www.tlc2.uh.edu/dmmlg/Datasets
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Density-Based Spatial Clustering of Applications with Noise. In: Int. Conf. Knowledge Discovery and Data Mining (1996)
Anders, K.H.: A Hierarchical Graph-Clustering Approach to Find Groups of Objects. Technical Paper. In: ICA Commission on Map Generalization, 5th Workshop on Progress in Automated Map Generalization (2003)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications. In: Inf. Conf. Data Mining and Knowledge Discovery, pp. 169–194 (1998)
Kriegel, H.P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Int. Conf. Knowledge Discovery in Data Mining, pp. 672–677 (2005)
Hinneburg, A., Keim, D.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Conf. Knowledge Discovery in Data Mining (1998)
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Int. Conf. ACM SIGMOD on Management of data, pp. 73–84. ACM Press, New York (1998)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)
Lin, C., Chen, M.: A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging. In: Inf. Conf. 8th ACM SIGKDD on Knowledge Discovery and Data Mining, pp. 582–587. ACM Press, New York (2002)
Zhong, S., Ghosh, J.: A Unified Framework for Model-based Clustering. Int. J. Machine Learning Research 4, 1001–1037 (2003)
Surdeanu, M., Turmo, J., Ageno, A.: A Hybrid Unsupervised Approach for Document Clustering. In: Int. Conf. 11h ACM SIGKDD on Knowledge Discovery in Data Mining, pp. 685–690. ACM Press, New York (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Choo, J., Jiamthapthaksin, R., Chen, Cs., Celepcikay, O.U., Giusti, C., Eick, C.F. (2007). MOSAIC: A Proximity Graph Approach for Agglomerative Clustering. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-74553-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)