Skip to main content
Log in

MCC: a Multiple Consensus Clustering Framework

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Consensus clustering has emerged as an important extension of the classical clustering problem. Given a set of input clusterings of a given dataset, consensus clustering aims to find a single final clustering which is a better fit in some sense than the existing clusterings. There is a significant drawback in generating a single consensus clustering since different input clusterings could differ significantly. In this paper, we develop a new framework, called Multiple Consensus Clustering (MCC), to explore multiple clustering views of a given dataset from a set of input clusterings. Instead of generating a single consensus, we propose two sets of approaches to obtain multiple consensus. One employs the meta clustering method, and the other uses a hierarchical tree structure and further applies a dynamic programming algorithm to generate a flat partition from the hierarchical tree using the modularity measure. Multiple consensuses are finally obtained by applying consensus clustering algorithms to each cluster of the partition. Extensive experimental results on 11 real-world datasets and a case study on a Protein-Protein Interaction (PPI) dataset demonstrate the effectiveness of the MCC framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The software can be downloaded from http://glaros.dtc.umn.edu/gkhome/views/metis/.

References

  • Asa, B.-H., Elisseeff, A., Guyon, I. (2002). A stability based method for discovering structure in clustered data, Pacific Symposium on Biocomputing.

  • Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Michael, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harri, M., Hill, D., Traver, L., Kassarskis, A., Levis, S., Matese, J., Richardson, E., Ringwald, M., Rubin, G., Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 24–29.

    Article  Google Scholar 

  • Asur, S., Ucar, D., Parthasarathy, S. (2007). An ensemble framework for clustering protein-protein interaction networks. Bioinformatics, 23(13), i29–i40.

    Article  Google Scholar 

  • Azimi, J., & Fern, X. (2009). Adaptive cluster ensemble selection. In Proceedings of International Joint Conference on Artificial Intellegence (pp. 993–997).

  • Blake, C.L., & Merz, C.J. (1998). UCI repository of machine learning databases.

  • Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D. (2008). On modularity clustering. IEEE Transactions on in Knowledge and Data Engineering, 20(2), 172–188.

    Article  Google Scholar 

  • Bronstein, M.M., Bronstein, A.M., Kimmel, R., Yavneh, I. (2006). Multigrid multidimensional scaling. In Numerical Linear Algebra with Applications (NLAA), 13:149C171, March–April (pp. 149–171).

    Article  MathSciNet  Google Scholar 

  • Caruana, R., Elhawary, M., Nguyen, N. (2006). Meta clustering. In Proceedings IEEE International Conference on Data Mining.

  • Cui, Y., Fern, X.Z., Dy, J. (2007). Non-redundant multi-view clustering via orthogonalization. In ICDM (pp. 133–142).

  • Ding, C., & He, X. (2002). Cluster merging and splitting in hierarchical clustering algorithms. In ICDM (pp. 139–146).

  • Dongen, S.V., & Dongen, S.V. (2000). Performance criteria for graph clustering and Markov cluster experiments, Technical report INS-R0012, National Research Institute for Mathematics and Computer Science.

  • Fallah, S., Tritchler, D., Beyene, J. (2008). Estimating number of clusters based on a general similarity matrix with application to microarray data. Journal of Statistical Applications in Genetics and Molecular Biology, 7(1), 1–25.

    MathSciNet  MATH  Google Scholar 

  • Fern, X.Z., Brodley, C.E., Fern, X.Z., Brodley, C.E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the International Conference on Machine Learning.

  • Fern, X.Z., & Lin, W. (2008). Cluster ensemble selection. Journal of Statistical Analysis and Data Mining, 1(3), 128–141.

    Article  MathSciNet  Google Scholar 

  • Fred, A.L., & Jain, A.K. (2003). Robust data clustering. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2(128).

  • Gionis, A., Mannila, H., Tsaparas, P. (2005). Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering ICDE (pp. 341–352).

  • Han, E.-H., Boley, D., Gini, M., Gross, R., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J. (1998). WebACE: a Web agent for document categorization and exploration. In Proceedings of the 2nd International Conference on Autonomous Agents (pp. 408–415).

  • Hu, X., Yoo, I., Zhang, X., Nanavati, P., Das, D. (2005). Wavelet transformation and cluster ensemble for gene expression analysis. International Journal of Bioinformatics Research and Applications, 1(4), 447–460.

    Article  Google Scholar 

  • Li, T., & Ding, C. (2006). The relationships among various nonnegative matrix factorization methods for clustering. In Proceedings of IEEE International Conference on Data Mining 2006 (pp. 362–371).

  • Li, T., & Ding, C. (2008). Weighted consensus clustering. In Proceedings of 2008 SIAM International Conference on Data Mining (pp. 798–809).

  • Li, T., Ding, C., Jordan, M.I. (2007). Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7st IEEE International Conference on data Mining (pp. 577–582).

  • Mallows, C.L. (1972). A note on asymptotic joint normality. The Annals of Mathematical Statistics, 43(2), 508–515.

    Article  MathSciNet  Google Scholar 

  • McCallum, A.K. (1996). Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering,. http://www.cs.cmu.edu/mccallum/bow.

  • Meila, M. (2002). Comparing clusterings, Technical report, Statistics, University of Washington.

  • Navlakha, S., Rastogi, R., Shrivastava, N. (2008). Graph summarization with bounded error. In SIGMOD (pp. 419–432).

  • Navlakha, S., White, J., Nagarajan, N., Pop, M., Kingsford, C. (2009). Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information. In Inproceedings of the 13th Annual International Conference on Research in Computational Molecular Biology (pp. 400–417).

  • Newman, M.E.J. (2006). Modularity and community structure in networks. In PNAS (pp. 8577–8582).

    Article  Google Scholar 

  • Newman, M.E.J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.

    Article  Google Scholar 

  • Qi, Z., & Davidson, I. (2009). A principled and flexible framework for finding alternative clusterings. In SIGKDD (pp. 717–726).

  • Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.

    Article  Google Scholar 

  • Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 74(1), 016110.

    Article  MathSciNet  Google Scholar 

  • Shlens, J. (2009). A tutorial on principal component analysis, Technical report, Center for Neural Science, New York University.

  • Strehl, A., & Ghosh, J. (2003). Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208–230.

    Article  Google Scholar 

  • Strehl, A., Ghosh, J., Cardie, C. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583–617.

    MathSciNet  MATH  Google Scholar 

  • Tan, P.-N., Steinbach, M., Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley Longman Publishing Co.

    Google Scholar 

  • von Luxburg, U. (n.d.) A tutorial on spectral clustering, Techonical report.

  • Wu, J., Xiong, H., Chen, J. (2009). Towards understanding hierarchical clustering: a data distribution perspective. Neurocomputing, 72(10-12), 2319–2330.

    Article  Google Scholar 

  • Zhang, Y., Zeng, E., Li, T., Narasimhan, G. (2009). Weighted consensus clustering for identifying functional modules in protein-protein interaction networks. In The 8th International Conference on Machine Learning and Applications (pp. 539–544).

  • Zhanga, S., Ning, X., Zhang, X. -S. (2006). Identification of functional modules in a PPI network by clique percolation clustering. Journal of Computational Biology and Chemistry, 30(6), 445–451.

    Article  Google Scholar 

  • Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Conference of Information and Knowledge Management (pp. 515–524).

  • Zhou, D., Li, J., Zha, H. (2005). A new mallows distance based metric for comparing clusterings. In Proceeding of International Conference on Machine Learning (pp. 1028–1035).

Download references

Funding

The work is partially supported by NSF grants DBI-0850203 and HRD-0833093. This work is partially supported by the National Science Foundation under grants DBI-0850203, HRD-0833093, CNS-1126619, IIS-1213026, and CNS-1461926, the U.S. Department of Homeland Security VACCINE Center under Award Number 2009-ST-061-CI0001, and an FIU Dissertation Year Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Zhang, Y., Wang, D. et al. MCC: a Multiple Consensus Clustering Framework. J Classif 36, 414–434 (2019). https://doi.org/10.1007/s00357-019-09318-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09318-4

Keywords

Navigation