Skip to main content

Clustering Categorical Data Using an Extended Modularity Measure

  • Conference paper
Neural Information Processing. Models and Applications (ICONIP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6444))

Included in the following conference series:

Abstract

Newman and Girvan [12] recently proposed an objective function for graph clustering called the Modularity function which allows automatic selection of the number of clusters. Empirically, higher values of the Modularity function have been shown to correlate well with good graph clustering. In this paper we propose an extended Modularity measure for categorical data clustering; first, we establish the connection with the Relational Analysis criterion. The proposed Modularity measure introduces an automatic weighting scheme which takes in consideration the profile of each data object. A modified Relational Analysis algorithm is then presented to search for the partitions maximizing the criterion. This algorithm deals linearly with large data set and allows natural clusters identification, i.e. doesn’t require fixing the number of clusters and size of each cluster. Experimental results indicate that the new algorithm is efficient and effective at finding both good clustering and the appropriate number of clusters across a variety of real-world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, G., Kempe, D.: Modularity-maximizing graph communities via mathematical programming. The European Physical Journal B 66(33), 409–418 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  2. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

    Google Scholar 

  3. Barbara, D., Couto, J., Li, Y.: COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of the Eleventh ACM CIKM Conference, pp. 582–589 (2002)

    Google Scholar 

  4. Bock, H.-H.: Probabilistic aspects in cluster analysis. In: Opitz, O. (ed.) Conceptual and Numerical Analysis of Data, pp. 12–44. Springer, Berlin (1989)

    Chapter  Google Scholar 

  5. Celeux, G., Govaert, G.: Clustering criteria for discrete data and latent class models. Journal of Classification 8, 157–176 (1991)

    Article  MATH  Google Scholar 

  6. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS - clustering categorical data using summaries. In: Proceedings of the Fifth ACM SIGKDD Conference, pp. 73–83 (1999)

    Google Scholar 

  7. Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. In: Proceedings of the 24rd VLDB Conference, pp. 311–322 (1998)

    Google Scholar 

  8. Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25, 345–366 (2000)

    Article  Google Scholar 

  9. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  10. Marcotorchino, J.F.: Relational analysis theory as a general approach to data analysis and data fusion. In: Cognitive Systems with Interactive Sensors (2006)

    Google Scholar 

  11. Marcotorchino, J.F., Michaud, P.: Optimisation en analyse ordinale des données (1978) (in Masson)

    Google Scholar 

  12. Newman, M., Girvan, M.: Finding and evaluating community structure in networks. Physical Review E 69, 26113 (2004)

    Article  Google Scholar 

  13. White, S., Smyth, P.: A spectral clustering approach to finding communities in graphs. In: SDM, pp. 76–84 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Labiod, L., Grozavu, N., Bennani, Y. (2010). Clustering Categorical Data Using an Extended Modularity Measure. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17534-3_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17533-6

  • Online ISBN: 978-3-642-17534-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics