Abstract
Newman and Girvan [12] recently proposed an objective function for graph clustering called the Modularity function which allows automatic selection of the number of clusters. Empirically, higher values of the Modularity function have been shown to correlate well with good graph clustering. In this paper we propose an extended Modularity measure for categorical data clustering; first, we establish the connection with the Relational Analysis criterion. The proposed Modularity measure introduces an automatic weighting scheme which takes in consideration the profile of each data object. A modified Relational Analysis algorithm is then presented to search for the partitions maximizing the criterion. This algorithm deals linearly with large data set and allows natural clusters identification, i.e. doesn’t require fixing the number of clusters and size of each cluster. Experimental results indicate that the new algorithm is efficient and effective at finding both good clustering and the appropriate number of clusters across a variety of real-world data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, G., Kempe, D.: Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66(33), 409–418 (2008)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Barbara, D., Couto, J., Li, Y.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh ACM CIKM Conference, pp. 582–589 (2002)
Bock, H.-H.: Probabilistic aspects in cluster analysis. In: Opitz, O. (ed.) Conceptual and Numerical Analysis of Data, pp. 12–44. Springer, Berlin (1989)
Celeux, G., Govaert, G.: Clustering criteria for discrete data and latent class models. Journal of Classification 8, 157–176 (1991)
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS - clustering categorical data using summaries. In: Proceedings of the Fifth ACM SIGKDD Conference, pp. 73–83 (1999)
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. In: Proceedings of the 24rd VLDB Conference, pp. 311–322 (1998)
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25, 345–366 (2000)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Marcotorchino, J.F.: Relational analysis theory as a general approach to data analysis and data fusion. In: Cognitive Systems with Interactive Sensors (2006)
Marcotorchino, J.F., Michaud, P.: Optimisation en analyse ordinale des données (1978) (in Masson)
Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69, 26113 (2004)
White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SDM, pp. 76–84 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Labiod, L., Grozavu, N., Bennani, Y. (2010). Clustering Categorical Data Using an Extended Modularity Measure. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-17534-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17533-6
Online ISBN: 978-3-642-17534-3
eBook Packages: Computer ScienceComputer Science (R0)