Abstract
Evidence accumulation clustering (EAC) is a clustering combination method in which a pair-wise similarity matrix (the so-called co-association matrix) is learnt from a clustering ensemble. This co-association matrix counts the co-occurrences (in the same cluster) of pairs of objects, thus avoiding the cluster correspondence problem faced by many other clustering combination approaches. Starting from the observation that co-occurrences are a special type of dyads, we propose to model co-association using a generative aspect model for dyadic data. Under the proposed model, the extraction of a consensus clustering corresponds to solving a maximum likelihood estimation problem, which we address using the expectation-maximization algorithm. We refer to the resulting method as probabilistic ensemble clustering algorithm (PEnCA). Moreover, the fact that the problem is placed in a probabilistic framework allows using model selection criteria to automatically choose the number of clusters. To compare our method with other combination techniques (also based on probabilistic modeling of the clustering ensemble problem), we performed experiments with synthetic and real benchmark data-sets, showing that the proposed approach leads to competitive results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ayad, H.G., Kamel, M.S.: Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30(1), 160–173 (2008)
Buhmann, J.: Information theoretic model validation for clustering. In: IEEE International Symposium on Information Theory (2010)
Bulò, S.R., Lourenço, A., Fred, A., Pelillo, M.: Pairwise probabilistic clustering using evidence accumulation. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR&SPR 2010. LNCS, vol. 6218, pp. 395–404. Springer, Heidelberg (2010)
Demspter, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (B) 39, 1–38 (1977)
Fern, X.Z., Brodley, C.E.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proc. ICML 2004 (2004)
Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 381–396 (2002)
Fischer, B., Roth, V., Buhmann, J.: Clustering with the connectivity kernel. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Neural Information Processing Systems – NIPS, vol. 16 (2004)
Fred, A.: Finding consistent clusters in data partitions. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 309–318. Springer, Heidelberg (2001)
Fred, A., Jain, A.: Combining multiple clustering using evidence accumulation. IEEE Trans. Pattern Analysis and Machine Intelligence 27(6), 835–850 (2005)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. USA 101 suppl. 1, 5228–5235 (2004)
Hofmann, T.: Unsupervised learning from dyadic data, pp. 466–472. MIT Press, Cambridge (1998)
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical report, Cambridge, MA, USA (1998)
Hofmann, T., Puzicha, J., Jordan, M.I.: Learning from dyadic data. In: Advances in Neural Information Processing Systems (NIPS), vol. 11. MIT Press, Cambridge (1999)
Lourenço, A., Fred, A., Jain, A.K.: On the scalability of evidence accumulation clustering. In: ICPR, Istanbul, Turkey (August 23-26, 2010)
Rissanen, J.: Stochastic COmplexity in Statistical Inquiry. World Scientific, Singapore (1989)
Steyvers, M., Griffiths, T.: Latent Semantic Analysis: A Road to Meaning. In: Probabilistic Topic Models. Lawrence Erlbaum, Mahwah (2007)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. of Machine Learning Research 3 (2002)
Topchy, A., Jain, A., Punch, W.: A mixture model of clustering ensembles. In: Proc. of the SIAM Conf. on Data Mining (April 2004)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: 9th SIAM International Conference on Data Mining. SIAM, Philadelphia (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lourenço, A., Fred, A., Figueiredo, M. (2011). A Generative Dyadic Aspect Model for Evidence Accumulation Clustering. In: Pelillo, M., Hancock, E.R. (eds) Similarity-Based Pattern Recognition. SIMBAD 2011. Lecture Notes in Computer Science, vol 7005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24471-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-24471-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24470-4
Online ISBN: 978-3-642-24471-1
eBook Packages: Computer ScienceComputer Science (R0)