Clustering Categorical Data Using an Extended Modularity Measure

Labiod, Lazhar; Grozavu, Nistor; Bennani, Younèns

doi:10.1007/978-3-642-17534-3_38

Lazhar Labiod¹⁹,
Nistor Grozavu¹⁹ &
Younèns Bennani¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6444))

Included in the following conference series:

International Conference on Neural Information Processing

2632 Accesses
2 Citations

Abstract

Newman and Girvan [12] recently proposed an objective function for graph clustering called the Modularity function which allows automatic selection of the number of clusters. Empirically, higher values of the Modularity function have been shown to correlate well with good graph clustering. In this paper we propose an extended Modularity measure for categorical data clustering; first, we establish the connection with the Relational Analysis criterion. The proposed Modularity measure introduces an automatic weighting scheme which takes in consideration the profile of each data object. A modified Relational Analysis algorithm is then presented to search for the partitions maximizing the criterion. This algorithm deals linearly with large data set and allows natural clusters identification, i.e. doesn’t require fixing the number of clusters and size of each cluster. Experimental results indicate that the new algorithm is efficient and effective at finding both good clustering and the appropriate number of clusters across a variety of real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, G., Kempe, D.: Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66(33), 409–418 (2008)
Article MathSciNet MATH Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Google Scholar
Barbara, D., Couto, J., Li, Y.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh ACM CIKM Conference, pp. 582–589 (2002)
Google Scholar
Bock, H.-H.: Probabilistic aspects in cluster analysis. In: Opitz, O. (ed.) Conceptual and Numerical Analysis of Data, pp. 12–44. Springer, Berlin (1989)
Chapter Google Scholar
Celeux, G., Govaert, G.: Clustering criteria for discrete data and latent class models. Journal of Classification 8, 157–176 (1991)
Article MATH Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS - clustering categorical data using summaries. In: Proceedings of the Fifth ACM SIGKDD Conference, pp. 73–83 (1999)
Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. In: Proceedings of the 24rd VLDB Conference, pp. 311–322 (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25, 345–366 (2000)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)
Article Google Scholar
Marcotorchino, J.F.: Relational analysis theory as a general approach to data analysis and data fusion. In: Cognitive Systems with Interactive Sensors (2006)
Google Scholar
Marcotorchino, J.F., Michaud, P.: Optimisation en analyse ordinale des données (1978) (in Masson)
Google Scholar
Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69, 26113 (2004)
Article Google Scholar
White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SDM, pp. 76–84 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

LIPN-UMR 7030, Université Paris 13, 99, av. J-B Clément, 93430, Villetaneuse, France
Lazhar Labiod, Nistor Grozavu & Younèns Bennani

Authors

Lazhar Labiod
View author publications
You can also search for this author in PubMed Google Scholar
Nistor Grozavu
View author publications
You can also search for this author in PubMed Google Scholar
Younèns Bennani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Murdoch University, 6150, Murdoch, WA, Australia
Kok Wai Wong
The Australian National University, 0200, Canberra, ACT, Australia
B. Sumudu U. Mendis
School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Northfields Avenue, 2522, P.O. Box, Wollongong, NSW, Australia
Abdesselam Bouzerdoum

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Labiod, L., Grozavu, N., Bennani, Y. (2010). Clustering Categorical Data Using an Extended Modularity Measure. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-17534-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17533-6
Online ISBN: 978-3-642-17534-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics