Abstract
This paper addresses the problem of co-clustering binary data in the latent block model framework with diagonal constraints for resulting data partitions. We consider the Bernoulli generative mixture model and present three new methods differing in the assumptions made about the degree of homogeneity of diagonal blocks. The proposed models are parsimonious and allow to take into account the structure of a data matrix when reorganizing it into homogeneous diagonal blocks. We derive algorithms for each of the presented models based on the classification expectation-maximization algorithm which maximizes the complete data likelihood. We show that our contribution can outperform other state-of-the-art (co)-clustering methods on synthetic sparse and non-sparse data. We also prove the efficiency of our approach in the context of document clustering, by using real-world benchmark data sets.
Similar content being viewed by others
References
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919–1986 (2007)
Batagelj, V., Ferligoj, A., Doreian, P.: Fitting Pre-specified Blockmodels. Springer, Tokyo (1998)
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Bock, H.: Convexity based clustering criteria: theory, algorithm and applications in statistics. Stat. Methods Appl. 12, 293–318 (2003)
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, pp. 93–103 (2000)
Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: SIAM-SDM, pp. 114–125 (2004)
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01: SIGKDD, ACM, pp. 269–274 (2001)
Dhillon, I., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: SIGKDD, ACM, pp. 126–135 (2006)
Doreian, P., Batagelj, V., Ferligoj, A.: Generalized Blockmodeling. Cambridge University Press, New York (2005)
Garcia, H., Proth, J.M.: A new cross-decomposition algorithm: The GPM comparison with the bond energy method. Control Cybern. 15, 155–165 (1986)
George, T.: A scalable collaborative filtering framework based on co-clustering. In: ICDM, pp. 625–628 (2005)
Girolami, M.: The topographic organization and visualization of binary data using multivariate-Bernoulli latent variable models. IEEE Trans. Neural Netw. 12(6), 1367–1374 (2001)
Govaert, G.: Classification croisée. Thèse d’état, Université Paris 6, France (1983)
Govaert, G.: Simultaneous clustering of rows and columns. Control Cybern. 24(4), 437–458 (1995)
Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recognit. 36, 463–473 (2003)
Govaert, G., Nadif, M.: An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 643–647 (2005)
Govaert, G., Nadif, M.: Block bernoulli parsimonious clustering models. Selected Contributions in Data Analysis and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 203–212. Springer, Berlin (2007)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theory Methods 39(3), 416–425 (2010)
Govaert, G., Nadif, M.: Co-Clustering. Wiley (2013)
Gupta, J., Singh, S., Verma, NK.: MTBA: Matlab toolbox for biclustering analysis. IEEE, pp. 94–97 (2013)
Hofmann, T., Puzicha, J.: Latent class models for collaborative filtering. IJCAI, pp. 688–693. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (2013)
Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)
Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015)
Labiod, L., Nadif, M.: Co-clustering for binary and categorical data with maximum modularity. In: ICDM, pp. 1140–1145 (2011)
Lee, S., Huang, J.: A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat. Comput. 24(3), 429–441 (2014)
Li, T.: A general model for clustering binary data. In: ACM SIGKDD, ACM, pp. 188–197 (2005)
Lomet, A.: Sélection de modèle pour la classification croisée de données continues (2012)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans. Comput. Biol. Bioinform. 1(1), 24–45 (2004)
Marcotorchino, F.: Block seriation problems: a unified approach. Appl. Stoch. Models Data Anal. 3, 73–91 (1987)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)
Si, L., Jin, R.: Flexible mixture model for collaborative filtering. In: ICML, pp. 704–711 (2003)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
Symons, M.J.: Clustering criteria and multivariate normal mixture. Biometrics 37, 35–43 (1981)
Van Dijk, B., Van Rosmalen, J., Paap, R.: A bayesian approach to two-mode clustering (2009)
Van Mechelen, I., Bock, H.H., De Boeck, P.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)
Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra et al., (eds). Springer, Heidelberg (2001)
Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22(2), 415–428 (2012)
Acknowledgments
This work has been funded by AAP Sorbonne Paris Cité.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Laclau, C., Nadif, M. Diagonal latent block model for binary data. Stat Comput 27, 1145–1163 (2017). https://doi.org/10.1007/s11222-016-9677-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9677-7