Abstract
Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables.
Similar content being viewed by others
References
Banerjee A, Dhillon I, Ghosh J, Merugu S (2007) A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. J Mach Learn Res 8:1919–1986
Berkhin P (2006) A survey of clustering data mining techniques. Springer, Berlin
Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Tech. rep, INRIA
Biernacki C, Celeux G, Govaert G (2002) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Biernacki C, Celeux G, Govaert G (2010) Exact and monte carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Infer 140(11):2991–3002
Charrad M, Lechevallier Y, Saporta G, Ben Ahmed M (2010) Détermination du nombre de classes dans les méthodes de bipartitionnement. In: 17ème Rencontres de la Société Francophone de Classification, Saint-Denis de la Réunion, pp 119–122
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis. CRC, Boca Raton
Good IJ (1965) Categorization of classification. Mathematics and Computer Science in Biology and Medicine, Her Majesty’s Stationery Office
Govaert G (1977) Algorithme de classification d’un tableau de contingence. In: First international symposium on data analysis and informatics, INRIA, Versailles
Govaert G (1995) Simultaneous clustering of rows and columns. Control Cybern 24(4):437–458
Govaert G, Nadif M (2003) Clustering with block mixture models. Pattern Recogn 36:463–473
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Hartigan JA (2000) Bloc voting in the United States senate. J Classif 17(1):29–49
Jagalur M, Pal C, Learned-Miller E, Zoeller RT, Kulp D (2007) Analyzing in situ gene expression in the mouse brain with image registration, feature extraction and block clustering. BMC Bioinforma 8(Suppl 10):S5
Kemp C, Griffiths TL, Tenenbaum JB (2004) Discovering latent classes in relational data. Tech. rep, Computer science and artificial intelligence laboratory
Keribin C, Brault V, Celeux G, Govaert G (2012) Model selection for the binary latent block model. In: Colubi A, Fokianos K, Gonzalez-Rodriguez G, Kontoghiorghes EJ (eds) Proceedings of Compstat 2012, 20th international conference on computational statistics, The International Statistical Institute/International Association for Statistical, Computing, pp 379–390
Keribin C, Brault V, Celeux G, Govaert G et al (2013) Estimation and selection for the latent block model on categorical data. Tech. rep, INRIA
Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4):703–716
Lomet A, Govaert G, Grandvalet Y (2012a) Design of artificial data tables for co-clustering analysis. Université de Technologie de Compiègne, Tech. rep
Lomet A, Govaert G, Grandvalet Y (2012b) Model selection in block clustering by the integrated classification likelihood. In: Colubi A, Fokianos K, Gonzalez-Rodriguez G, Kontoghiorghes EJ (eds) Proceedings of Compstat 2012, 20th international conference on computational statistics, The International Statistical Institute/International Association for Statistical, Computing, pp 519–530
Mariadassou M, Matias C (2012) Convergence of the groups posterior distribution in latent or stochastic block models. Tech. rep., arXiv
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Nadif M, Govaert G (2008) Algorithms for model-based block Gaussian clustering. In: DMIN’08, the 2008 international conference on data mining, Las Vegas, Nevada, USA
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B Stat Methodol 59(4):731–792
Robert C (2001) The Bayesian choice. Springer, Berlin
Rocci R, Vichi M (2008) Two-mode multi-partitioning. Comput Stat Data Anal 52(4):1984–2003
Schepers J, Ceulemans E, Van Mechelen I (2008) Selecting among multi-mode partitioning models of different complexities: a comparison of four model selection criteria. J Classif 25(1):67–85
Seldin Y, Tishby N (2010) Pac-Bayesian analysis of co-clustering and beyond. J Mach Learn Res 11: 3595–3646
Shan H, Banerjee A (2008) Bayesian co-clustering. In: 8th IEEE international conference on data mining, 2008. ICDM’08, pp 530–539
Van Dijk B, Van Rosmalen J, Paap R (2009) A Bayesian approach to two-mode clustering. Tech. Rep. 2009–06, Econometric Institute. http://hdl.handle.net/1765/15112
Wyse J, Friel N (2012) Block clustering with collapsed latent block models. Stat Comput 22(1):415–428
Acknowledgments
We thank the reviewers and associate editor for their valuable inputs. This work, carried out in the framework of the Labex MS2T (ANR-11-IDEX-0004-02), was partially funded by the French National Agency for Research under grant ClasSel ANR-08-EMER-002 and the European ICT FP7 under grant No 247022-MASH.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Derivation of the approximation of \(\textit{ICL}\)
The first term of the expansion (2) (\(\log p(\mathbf {X}|\mathbf {z},\mathbf {w},M)\)) can be approximated in a BIC-like fashion, since the table entries are independent conditionally on the row/column partitions:
where \(\lambda \) is the dimensionality of vector \(\varvec{\alpha }\) (that is, of \(\mathcal {A}\)).
The two terms \(\log p(\mathbf {z}|M)\) and \(\log p(\mathbf {w}|M)\) can be computed exactly by taking an informative prior distribution on \(\varvec{\pi }\) and \(\varvec{\rho }\) when the proportion parameters are free. Indeed, a Dirichlet distribution \(\mathcal {D}(\delta ,\ldots ,\delta )\) yields:
where \(n_k\) is the number of rows in the cluster \(k\). The details of calculations are given by Robert (2001).
Using non-informative Jeffreys prior distributions for the proportion parameters (\(\delta =1/2\)), the log-priors are:
Because \((\mathbf {z}, \mathbf {w})\) are unknown, we replace them by their estimates \((\hat{\mathbf {z}}, \hat{\mathbf {w}})\) obtained by the VEM algorithm. When \(\hat{n}_k\) and \(\hat{d}_\ell \) are large enough, the approximation of the Gamma function by the Stirling formula \( \varGamma (t+1) \approx t^{t+1/2} \exp (-t) (2\pi )^{1/2}\) can be used. Neglecting terms of order \(O(1)\), the log-prior distributions are then approximated as follows:
In addition, \(\sum _{k=1}^g \hat{n}_k \log \frac{\hat{n}_k}{n}=\max _{\varvec{\pi }} \log p(\hat{\mathbf {z}}| \varvec{\pi }, M)\), \(\sum _{\ell =1}^m \hat{d}_\ell \log \frac{\hat{d}_\ell }{d}=\max _{\varvec{\rho }} \log p(\hat{\mathbf {w}}| \varvec{\rho }, M)\) (see Robert 2001; Biernacki et al. 2010. For \(\delta =1/2\), we obtain:
Then, the ICL criterion can be approached by:
Appendix B: Derivation of exact \(\textit{ICL}\)
The criterion \(\textit{ICL}\) can be broken down in three terms:
Then, the first term of the expansion (2) is rewritten using the following decomposition:
where \(p(\varvec{\alpha }|M)\) and \(p(\varvec{\alpha }| \mathbf {X},\mathbf {z},\mathbf {w},M)\) are respectively the prior and posterior distributions of \(\varvec{\alpha }\).
For the latent block model with different variances, given the row and column labels, the entries \(x_{ij}\) of each block are independent and identically distributed. We thus apply the standard results for Gaussian samples (Gelman 2004), where the distributions are defined by:
Using the definitions of these distributions, the first term of the expansion (2),
is identified, after some calculations, as (3).
For the latent block model with equal variances, the standard results need to be adapted to account for the shared parameter \(\sigma ^2\). The prior distributions are now defined as follows:
The posterior distribution is then computed thanks to Bayes’ formula
This probability can be factorized:
Thus, the posterior distribution is defined (assuming the posterior independence of \(\mu _{k \ell }\)):
For the terms related to the proportions, when the proportions are free, we assume a symmetric Dirichlet prior distribution of parameters \((\delta _0,\ldots ,\delta _0)\) for the row and column parameters \((\varvec{\pi },\varvec{\rho })\), so that:
More details are given by Biernacki et al. (1998).
Rights and permissions
About this article
Cite this article
Lomet, A., Govaert, G. & Grandvalet, Y. Model selection for Gaussian latent block clustering with the integrated classification likelihood. Adv Data Anal Classif 12, 489–508 (2018). https://doi.org/10.1007/s11634-013-0161-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-013-0161-3
Keywords
- Co-clustering
- Latent block model
- Model selection
- Continuous data
- Integrated classification likelihood
- BIC