Diagonal latent block model for binary data

Laclau, Charlotte; Nadif, Mohamed

doi:10.1007/s11222-016-9677-7

Diagonal latent block model for binary data

Published: 29 June 2016

Volume 27, pages 1145–1163, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Charlotte Laclau¹ &
Mohamed Nadif¹

525 Accesses
6 Citations
Explore all metrics

Abstract

This paper addresses the problem of co-clustering binary data in the latent block model framework with diagonal constraints for resulting data partitions. We consider the Bernoulli generative mixture model and present three new methods differing in the assumptions made about the degree of homogeneity of diagonal blocks. The proposed models are parsimonious and allow to take into account the structure of a data matrix when reorganizing it into homogeneous diagonal blocks. We derive algorithms for each of the presented models based on the classification expectation-maximization algorithm which maximizes the complete data likelihood. We show that our contribution can outperform other state-of-the-art (co)-clustering methods on synthetic sparse and non-sparse data. We also prove the efficiency of our approach in the context of document clustering, by using real-world benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919–1986 (2007)
MathSciNet MATH Google Scholar
Batagelj, V., Ferligoj, A., Doreian, P.: Fitting Pre-specified Blockmodels. Springer, Tokyo (1998)
Book MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Article Google Scholar
Bock, H.: Convexity based clustering criteria: theory, algorithm and applications in statistics. Stat. Methods Appl. 12, 293–318 (2003)
MathSciNet MATH Google Scholar
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)
Article MathSciNet MATH Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, pp. 93–103 (2000)
Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: SIAM-SDM, pp. 114–125 (2004)
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01: SIGKDD, ACM, pp. 269–274 (2001)
Dhillon, I., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)
MathSciNet MATH Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: SIGKDD, ACM, pp. 126–135 (2006)
Doreian, P., Batagelj, V., Ferligoj, A.: Generalized Blockmodeling. Cambridge University Press, New York (2005)
MATH Google Scholar
Garcia, H., Proth, J.M.: A new cross-decomposition algorithm: The GPM comparison with the bond energy method. Control Cybern. 15, 155–165 (1986)
MathSciNet MATH Google Scholar
George, T.: A scalable collaborative filtering framework based on co-clustering. In: ICDM, pp. 625–628 (2005)
Girolami, M.: The topographic organization and visualization of binary data using multivariate-Bernoulli latent variable models. IEEE Trans. Neural Netw. 12(6), 1367–1374 (2001)
Article Google Scholar
Govaert, G.: Classification croisée. Thèse d’état, Université Paris 6, France (1983)
Govaert, G.: Simultaneous clustering of rows and columns. Control Cybern. 24(4), 437–458 (1995)
MATH Google Scholar
Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recognit. 36, 463–473 (2003)
Article MATH Google Scholar
Govaert, G., Nadif, M.: An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 643–647 (2005)
Govaert, G., Nadif, M.: Block bernoulli parsimonious clustering models. Selected Contributions in Data Analysis and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 203–212. Springer, Berlin (2007)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Article MathSciNet MATH Google Scholar
Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theory Methods 39(3), 416–425 (2010)
Article MathSciNet MATH Google Scholar
Govaert, G., Nadif, M.: Co-Clustering. Wiley (2013)
Gupta, J., Singh, S., Verma, NK.: MTBA: Matlab toolbox for biclustering analysis. IEEE, pp. 94–97 (2013)
Hofmann, T., Puzicha, J.: Latent class models for collaborative filtering. IJCAI, pp. 688–693. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (2013)
Article MATH Google Scholar
Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)
Article Google Scholar
Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015)
Article MathSciNet MATH Google Scholar
Labiod, L., Nadif, M.: Co-clustering for binary and categorical data with maximum modularity. In: ICDM, pp. 1140–1145 (2011)
Lee, S., Huang, J.: A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat. Comput. 24(3), 429–441 (2014)
Article MathSciNet MATH Google Scholar
Li, T.: A general model for clustering binary data. In: ACM SIGKDD, ACM, pp. 188–197 (2005)
Lomet, A.: Sélection de modèle pour la classification croisée de données continues (2012)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans. Comput. Biol. Bioinform. 1(1), 24–45 (2004)
Article Google Scholar
Marcotorchino, F.: Block seriation problems: a unified approach. Appl. Stoch. Models Data Anal. 3, 73–91 (1987)
Article MATH Google Scholar
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)
Article Google Scholar
Si, L., Jin, R.: Flexible mixture model for collaborative filtering. In: ICML, pp. 704–711 (2003)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
MathSciNet MATH Google Scholar
Symons, M.J.: Clustering criteria and multivariate normal mixture. Biometrics 37, 35–43 (1981)
Article MathSciNet MATH Google Scholar
Van Dijk, B., Van Rosmalen, J., Paap, R.: A bayesian approach to two-mode clustering (2009)
Van Mechelen, I., Bock, H.H., De Boeck, P.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)
Article MathSciNet MATH Google Scholar
Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra et al., (eds). Springer, Heidelberg (2001)
Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22(2), 415–428 (2012)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work has been funded by AAP Sorbonne Paris Cité.

Author information

Authors and Affiliations

LIPADE, Paris Descartes University, 45, rue des Saints Pères, 75270, Paris, France
Charlotte Laclau & Mohamed Nadif

Authors

Charlotte Laclau
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charlotte Laclau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laclau, C., Nadif, M. Diagonal latent block model for binary data. Stat Comput 27, 1145–1163 (2017). https://doi.org/10.1007/s11222-016-9677-7

Download citation

Received: 07 December 2015
Accepted: 11 June 2016
Published: 29 June 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11222-016-9677-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagonal latent block model for binary data

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Efficient mixture model for clustering of sparse high dimensional binary data

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Diagonal latent block model for binary data

Abstract

Access this article

Similar content being viewed by others

Fast and consistent algorithm for the latent block model

Efficient mixture model for clustering of sparse high dimensional binary data

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation