Skip to main content
Log in

How to “alternatize” a clustering algorithm

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Given a clustering algorithm, how can we adapt it to find multiple, nonredundant, high-quality clusterings? We focus on algorithms based on vector quantization and describe a framework for automatic ‘alternatization’ of such algorithms. Our framework works in both simultaneous and sequential learning formulations and can mine an arbitrary number of alternative clusterings. We demonstrate its applicability to various clustering algorithms—k-means, spectral clustering, constrained clustering, and co-clustering—and effectiveness in mining a variety of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2): 94–105

    Article  Google Scholar 

  • Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: ICDM ’06, pp 53–62

  • Banerjee A, Merugu S, Dhillon IS, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749

    MathSciNet  MATH  Google Scholar 

  • Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. In: SDM ’07, pp 225–334

  • Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7: 488

    Article  Google Scholar 

  • Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: ICDM ’06, pp 107–118

  • Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: KDD ’04, pp 79–88

  • Cheng C, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99, pp 84–93

  • Conn AR, Gould NIM, Toint PL (1992) LANCELOT: a Fortran package for large-scale nonlinear optimization (release A), vol 17. Springer, New York

    Google Scholar 

  • Cui Y, Fern X, Dy JG (2007) Non-redundant multi-view clustering via orthogonalization. In: ICDM ’07, pp 133–142

  • Dang X, Bailey J (2010a) A hierarchical information theoretic technique for the discovery of non-linear alternative clusterings. In: KDD ’10, pp 573–582

  • Dang X, Bailey J (2010b) Generation of alternative clusterings using the CAMI approach. In: SDM ’10, pp 118–129

  • Davidson I, Basu S (2007) A survey of clustering with instance level constraints. In: TKDD, pp 1–41

  • Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: ICDM ’08, pp 773–778

  • Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01, pp 269–274

  • Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. In: KDD ’03, pp 89–98

  • Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybernet 4(1): 95–104

    Article  MathSciNet  Google Scholar 

  • Friedman N, Mosenzon O, Slonim N, Tishby N (2001) Multivariate information bottleneck. In: UAI ’01, pp 152–161

  • Gondek D, Hofmann T (2005) Non-redundant clustering with conditional ensembles. In: KDD ’05, pp 70–77

  • Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1): 1–24

    Article  Google Scholar 

  • Gondek D, Vaithyanathan S, Garg A (2005) Clustering with model-level constraints. In: SDM ’05, pp 126–137

  • Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recog Lett 36(2): 463–473

    Article  Google Scholar 

  • Greenacre M. (1988) Clustering the rows and columns of a contingency table. J Classif 5(1): 39–51

    Article  MathSciNet  MATH  Google Scholar 

  • Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. In: KDD ’10, pp 593–602

  • Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. In: SDM ’08, pp 858–869

  • Kaski S, Nikkilä J, Sinkkonen J, Lahti L, Knuuttila JEA, Roos C (2005) Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM TCBB 2(3): 203–216

    Google Scholar 

  • Kullback S, Gokhale D (1978) The information in contingency tables. Marcel Dekker Inc., New York

    MATH  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1): 79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: ICDM ’07, pp 577–582

  • Malakooti B, Yang Z (2004) Clustering and group selection of multiple criteria alternatives with application to space-based networks. IEEE Trans SMC B 34(1): 40–51

    Google Scholar 

  • Miettinen K, Salminen P (1999) Decision-aid for discrete multiple criteria decision making problems with imprecise data. Eur J Oper Res 119(1): 50–60

    Article  MATH  Google Scholar 

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52: 91–118

    Article  MATH  Google Scholar 

  • Nadif M, Govaert G (2005) Block clustering of contingency table and mixture model. In: IDA ’05, pp 249–259

  • Niu D, Dy JG, Jordan MI (2010) Multiple non-redundant spectral clustering views. In: ICML ’10, pp 831–838

  • Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: KDD ’09, pp 717–726

  • Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7: 2369–2397

    MathSciNet  MATH  Google Scholar 

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEE Trans Pattern Anal Meach Intel 22(8): 888–905

    Article  Google Scholar 

  • Sinkkonen J, Kaski S. (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14(1): 217–239

    Article  MATH  Google Scholar 

  • Sinkkonen J, Kaski S, Nikkilä J (2002) Discriminative clustering: optimal contingency tables by learning metrics. In: ECML ’02, pp 418–430

  • Sinkkonen J, Nikkilä J, Lahti L, Kaski S (2004) Associative clustering. In: ECML ’04, pp 396–406

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    MathSciNet  MATH  Google Scholar 

  • Tadepalli S (2009) Schemas of clustering. PhD thesis, Virginia Tech, Blacksburg

  • Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston

    Google Scholar 

  • Vinh NX, Epps J (2010) mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In: ICDM ’10, pp 521–530

  • Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10 pp 563–572

  • Zeng Y, Tang J, Garcia-Frias J, Gao GR (2002) An adaptive meta-clustering approach: combining the information from different clustering results. In: CSB ’02, pp 276–287

  • Zhang W, Surve A, Fern X, Dietterich T (2009) Learning non-redundant codebooks for classifying complex objects. In: ICML ’09, pp 1241–1248

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Shahriar Hossain.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hossain, M.S., Ramakrishnan, N., Davidson, I. et al. How to “alternatize” a clustering algorithm. Data Min Knowl Disc 27, 193–224 (2013). https://doi.org/10.1007/s10618-012-0288-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0288-4

Keywords

Navigation