Abstract
Given a clustering algorithm, how can we adapt it to find multiple, nonredundant, high-quality clusterings? We focus on algorithms based on vector quantization and describe a framework for automatic ‘alternatization’ of such algorithms. Our framework works in both simultaneous and sequential learning formulations and can mine an arbitrary number of alternative clusterings. We demonstrate its applicability to various clustering algorithms—k-means, spectral clustering, constrained clustering, and co-clustering—and effectiveness in mining a variety of datasets.
Similar content being viewed by others
References
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2): 94–105
Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: ICDM ’06, pp 53–62
Banerjee A, Merugu S, Dhillon IS, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. In: SDM ’07, pp 225–334
Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7: 488
Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: ICDM ’06, pp 107–118
Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: KDD ’04, pp 79–88
Cheng C, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99, pp 84–93
Conn AR, Gould NIM, Toint PL (1992) LANCELOT: a Fortran package for large-scale nonlinear optimization (release A), vol 17. Springer, New York
Cui Y, Fern X, Dy JG (2007) Non-redundant multi-view clustering via orthogonalization. In: ICDM ’07, pp 133–142
Dang X, Bailey J (2010a) A hierarchical information theoretic technique for the discovery of non-linear alternative clusterings. In: KDD ’10, pp 573–582
Dang X, Bailey J (2010b) Generation of alternative clusterings using the CAMI approach. In: SDM ’10, pp 118–129
Davidson I, Basu S (2007) A survey of clustering with instance level constraints. In: TKDD, pp 1–41
Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: ICDM ’08, pp 773–778
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01, pp 269–274
Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. In: KDD ’03, pp 89–98
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybernet 4(1): 95–104
Friedman N, Mosenzon O, Slonim N, Tishby N (2001) Multivariate information bottleneck. In: UAI ’01, pp 152–161
Gondek D, Hofmann T (2005) Non-redundant clustering with conditional ensembles. In: KDD ’05, pp 70–77
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1): 1–24
Gondek D, Vaithyanathan S, Garg A (2005) Clustering with model-level constraints. In: SDM ’05, pp 126–137
Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recog Lett 36(2): 463–473
Greenacre M. (1988) Clustering the rows and columns of a contingency table. J Classif 5(1): 39–51
Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. In: KDD ’10, pp 593–602
Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. In: SDM ’08, pp 858–869
Kaski S, Nikkilä J, Sinkkonen J, Lahti L, Knuuttila JEA, Roos C (2005) Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM TCBB 2(3): 203–216
Kullback S, Gokhale D (1978) The information in contingency tables. Marcel Dekker Inc., New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1): 79–86
Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: ICDM ’07, pp 577–582
Malakooti B, Yang Z (2004) Clustering and group selection of multiple criteria alternatives with application to space-based networks. IEEE Trans SMC B 34(1): 40–51
Miettinen K, Salminen P (1999) Decision-aid for discrete multiple criteria decision making problems with imprecise data. Eur J Oper Res 119(1): 50–60
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52: 91–118
Nadif M, Govaert G (2005) Block clustering of contingency table and mixture model. In: IDA ’05, pp 249–259
Niu D, Dy JG, Jordan MI (2010) Multiple non-redundant spectral clustering views. In: ICML ’10, pp 831–838
Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: KDD ’09, pp 717–726
Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7: 2369–2397
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEE Trans Pattern Anal Meach Intel 22(8): 888–905
Sinkkonen J, Kaski S. (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14(1): 217–239
Sinkkonen J, Kaski S, Nikkilä J (2002) Discriminative clustering: optimal contingency tables by learning metrics. In: ECML ’02, pp 418–430
Sinkkonen J, Nikkilä J, Lahti L, Kaski S (2004) Associative clustering. In: ECML ’04, pp 396–406
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
Tadepalli S (2009) Schemas of clustering. PhD thesis, Virginia Tech, Blacksburg
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston
Vinh NX, Epps J (2010) mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In: ICDM ’10, pp 521–530
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10 pp 563–572
Zeng Y, Tang J, Garcia-Frias J, Gao GR (2002) An adaptive meta-clustering approach: combining the information from different clustering results. In: CSB ’02, pp 276–287
Zhang W, Surve A, Fern X, Dietterich T (2009) Learning non-redundant codebooks for classifying complex objects. In: ICML ’09, pp 1241–1248
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Hossain, M.S., Ramakrishnan, N., Davidson, I. et al. How to “alternatize” a clustering algorithm. Data Min Knowl Disc 27, 193–224 (2013). https://doi.org/10.1007/s10618-012-0288-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0288-4