Skip to main content

Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5255))

Abstract

Cluster ensemble methods have recently emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. This paper presents two new similarity matrices, which are empirically evaluated and compared against the standard co-association matrix on six datasets (both artificial and real data) using four different combination methods and six clustering validity criteria. In all cases, the results suggest the new link-based similarity matrices are able to extract efficiently the information embedded in the input clusterings, and regularly suggest higher clustering quality in comparison to their competitor.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Wolpert, D.H., Macready, W.G.: No free lunch theorems for search. Technical Report SFI-TR-95-02-010, Santa Fe Institute (1995)

    Google Scholar 

  3. Topchy, A.P., Jain, A.K., Punch, W.F.: A mixture model for clustering ensembles. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) SDM. SIAM, Philadelphia (2004)

    Google Scholar 

  4. Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  5. Fred, A.L.N., Jain, A.K.: Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 835–850 (2005)

    Article  Google Scholar 

  6. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: ICDE, pp. 341–352. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  7. Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Fawcett, T., Mishra, N. (eds.) ICML, pp. 186–193. AAAI Press, Menlo Park (2003)

    Google Scholar 

  8. Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Brodley, C.E. (ed.) ICML. ACM International Conference Proceeding Series, vol. 69. ACM, New York (2004)

    Google Scholar 

  9. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MathSciNet  Google Scholar 

  10. Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B.A., Ziviani, N.: Link-based similarity measures for the classification of web documents. JASIST 57(2), 208–221 (2006)

    Article  Google Scholar 

  11. Klink, S., Reuther, P., Weber, A., Walter, B., Ley, M.: Analysing social networks within bibliographical data. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 234–243. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543. ACM, New York (2002)

    Google Scholar 

  13. Kuncheva, L.I., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1798–1808 (2006)

    Article  Google Scholar 

  14. de Castro, L.N.: Immune Engineering: Development of Computational Tools Inspired by the Artificial Immune Systems. Ph.d. thesis, DCA - FEEC/UNICAMP, Campinas/SP, Brazil (2001)

    Google Scholar 

  15. Campello, R.J.G.B.: A fuzzy extension of the rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters 28(7), 833–841 (2007)

    Article  Google Scholar 

  16. Nguyen, N., Caruana, R.: Consensus clusterings. In: ICDM, pp. 607–612. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  17. Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetica 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Berlin Heidelberg

About this paper

Cite this paper

Iam-on, N., Boongoen, T., Garrett, S. (2008). Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations. In: Jean-Fran, JF., Berthold, M.R., Horváth, T. (eds) Discovery Science. DS 2008. Lecture Notes in Computer Science(), vol 5255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88411-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88411-8_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88410-1

  • Online ISBN: 978-3-540-88411-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics