How to “alternatize” a clustering algorithm

Hossain, M. Shahriar; Ramakrishnan, Naren; Davidson, Ian; Watson, Layne T.

doi:10.1007/s10618-012-0288-4

How to “alternatize” a clustering algorithm

Published: 28 August 2012

Volume 27, pages 193–224, (2013)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

M. Shahriar Hossain¹,
Naren Ramakrishnan²,
Ian Davidson⁴ &
…
Layne T. Watson^2,3

714 Accesses
5 Citations
Explore all metrics

Abstract

Given a clustering algorithm, how can we adapt it to find multiple, nonredundant, high-quality clusterings? We focus on algorithms based on vector quantization and describe a framework for automatic ‘alternatization’ of such algorithms. Our framework works in both simultaneous and sequential learning formulations and can mine an arbitrary number of alternative clusterings. We demonstrate its applicability to various clustering algorithms—k-means, spectral clustering, constrained clustering, and co-clustering—and effectiveness in mining a variety of datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2): 94–105
Article Google Scholar
Bae E, Bailey J (2006) COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: ICDM ’06, pp 53–62
Banerjee A, Merugu S, Dhillon IS, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749
MathSciNet MATH Google Scholar
Banerjee A, Basu S, Merugu S (2007) Multi-way clustering on relation graphs. In: SDM ’07, pp 225–334
Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7: 488
Article Google Scholar
Caruana R, Elhawary M, Nguyen N, Smith C (2006) Meta clustering. In: ICDM ’06, pp 107–118
Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: KDD ’04, pp 79–88
Cheng C, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99, pp 84–93
Conn AR, Gould NIM, Toint PL (1992) LANCELOT: a Fortran package for large-scale nonlinear optimization (release A), vol 17. Springer, New York
Google Scholar
Cui Y, Fern X, Dy JG (2007) Non-redundant multi-view clustering via orthogonalization. In: ICDM ’07, pp 133–142
Dang X, Bailey J (2010a) A hierarchical information theoretic technique for the discovery of non-linear alternative clusterings. In: KDD ’10, pp 573–582
Dang X, Bailey J (2010b) Generation of alternative clusterings using the CAMI approach. In: SDM ’10, pp 118–129
Davidson I, Basu S (2007) A survey of clustering with instance level constraints. In: TKDD, pp 1–41
Davidson I, Qi Z (2008) Finding alternative clusterings using constraints. In: ICDM ’08, pp 773–778
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01, pp 269–274
Dhillon IS, Mallela S, Modha DS (2003) Information theoretic co-clustering. In: KDD ’03, pp 89–98
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybernet 4(1): 95–104
Article MathSciNet Google Scholar
Friedman N, Mosenzon O, Slonim N, Tishby N (2001) Multivariate information bottleneck. In: UAI ’01, pp 152–161
Gondek D, Hofmann T (2005) Non-redundant clustering with conditional ensembles. In: KDD ’05, pp 70–77
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1): 1–24
Article Google Scholar
Gondek D, Vaithyanathan S, Garg A (2005) Clustering with model-level constraints. In: SDM ’05, pp 126–137
Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recog Lett 36(2): 463–473
Article Google Scholar
Greenacre M. (1988) Clustering the rows and columns of a contingency table. J Classif 5(1): 39–51
Article MathSciNet MATH Google Scholar
Hossain MS, Tadepalli S, Watson LT, Davidson I, Helm RF, Ramakrishnan N (2010) Unifying dependent clustering and disparate clustering for non-homogeneous data. In: KDD ’10, pp 593–602
Jain P, Meka R, Dhillon IS (2008) Simultaneous unsupervised learning of disparate clusterings. In: SDM ’08, pp 858–869
Kaski S, Nikkilä J, Sinkkonen J, Lahti L, Knuuttila JEA, Roos C (2005) Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM TCBB 2(3): 203–216
Google Scholar
Kullback S, Gokhale D (1978) The information in contingency tables. Marcel Dekker Inc., New York
MATH Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1): 79–86
Article MathSciNet MATH Google Scholar
Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: ICDM ’07, pp 577–582
Malakooti B, Yang Z (2004) Clustering and group selection of multiple criteria alternatives with application to space-based networks. IEEE Trans SMC B 34(1): 40–51
Google Scholar
Miettinen K, Salminen P (1999) Decision-aid for discrete multiple criteria decision making problems with imprecise data. Eur J Oper Res 119(1): 50–60
Article MATH Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52: 91–118
Article MATH Google Scholar
Nadif M, Govaert G (2005) Block clustering of contingency table and mixture model. In: IDA ’05, pp 249–259
Niu D, Dy JG, Jordan MI (2010) Multiple non-redundant spectral clustering views. In: ICML ’10, pp 831–838
Qi Z, Davidson I (2009) A principled and flexible framework for finding alternative clusterings. In: KDD ’09, pp 717–726
Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7: 2369–2397
MathSciNet MATH Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEE Trans Pattern Anal Meach Intel 22(8): 888–905
Article Google Scholar
Sinkkonen J, Kaski S. (2002) Clustering based on conditional distributions in an auxiliary space. Neural Comput 14(1): 217–239
Article MATH Google Scholar
Sinkkonen J, Kaski S, Nikkilä J (2002) Discriminative clustering: optimal contingency tables by learning metrics. In: ECML ’02, pp 418–430
Sinkkonen J, Nikkilä J, Lahti L, Kaski S (2004) Associative clustering. In: ECML ’04, pp 396–406
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
MathSciNet MATH Google Scholar
Tadepalli S (2009) Schemas of clustering. PhD thesis, Virginia Tech, Blacksburg
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston
Google Scholar
Vinh NX, Epps J (2010) mincentropy: a novel information theoretic approach for the generation of alternative clusterings. In: ICDM ’10, pp 521–530
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10 pp 563–572
Zeng Y, Tang J, Garcia-Frias J, Gao GR (2002) An adaptive meta-clustering approach: combining the information from different clustering results. In: CSB ’02, pp 276–287
Zhang W, Surve A, Fern X, Dietterich T (2009) Learning non-redundant codebooks for classifying complex objects. In: ICML ’09, pp 1241–1248

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Virginia State University, 1 Hayden Drive, Petersburg, VA, 23806, USA
M. Shahriar Hossain
Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Naren Ramakrishnan & Layne T. Watson
Department of Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
Layne T. Watson
Department of Computer Science, University of California, Davis, CA, 95616, USA
Ian Davidson

Authors

M. Shahriar Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Naren Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Ian Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Layne T. Watson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Shahriar Hossain.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hossain, M.S., Ramakrishnan, N., Davidson, I. et al. How to “alternatize” a clustering algorithm. Data Min Knowl Disc 27, 193–224 (2013). https://doi.org/10.1007/s10618-012-0288-4

Download citation

Received: 12 September 2011
Accepted: 04 August 2012
Published: 28 August 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10618-012-0288-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to “alternatize” a clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Hierarchical Clustering for Large Data Sets

Combinatorial Optimization Approaches for Data Clustering

A Heuristic Automatic Clustering Method Based on Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

How to “alternatize” a clustering algorithm

Abstract

Access this article

Similar content being viewed by others

Hierarchical Clustering for Large Data Sets

Combinatorial Optimization Approaches for Data Clustering

A Heuristic Automatic Clustering Method Based on Hierarchical Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation