Skip to main content
Log in

Diagonal latent block model for binary data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper addresses the problem of co-clustering binary data in the latent block model framework with diagonal constraints for resulting data partitions. We consider the Bernoulli generative mixture model and present three new methods differing in the assumptions made about the degree of homogeneity of diagonal blocks. The proposed models are parsimonious and allow to take into account the structure of a data matrix when reorganizing it into homogeneous diagonal blocks. We derive algorithms for each of the presented models based on the classification expectation-maximization algorithm which maximizes the complete data likelihood. We show that our contribution can outperform other state-of-the-art (co)-clustering methods on synthetic sparse and non-sparse data. We also prove the efficiency of our approach in the context of document clustering, by using real-world benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets.html.

  2. http://www.cs.umn.edu/~cluto.

  3. http://adios.tau.ac.il/.

References

  • Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to bregman co-clustering and matrix approximation. J. Mach. Learn. Res. 8, 1919–1986 (2007)

    MathSciNet  MATH  Google Scholar 

  • Batagelj, V., Ferligoj, A., Doreian, P.: Fitting Pre-specified Blockmodels. Springer, Tokyo (1998)

    Book  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)

    Article  Google Scholar 

  • Bock, H.: Convexity based clustering criteria: theory, algorithm and applications in statistics. Stat. Methods Appl. 12, 293–318 (2003)

    MathSciNet  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, pp. 93–103 (2000)

  • Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: SIAM-SDM, pp. 114–125 (2004)

  • Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD ’01: SIGKDD, ACM, pp. 269–274 (2001)

  • Dhillon, I., Mallela, S., Kumar, R.: A divisive information theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)

    MathSciNet  MATH  Google Scholar 

  • Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: SIGKDD, ACM, pp. 126–135 (2006)

  • Doreian, P., Batagelj, V., Ferligoj, A.: Generalized Blockmodeling. Cambridge University Press, New York (2005)

    MATH  Google Scholar 

  • Garcia, H., Proth, J.M.: A new cross-decomposition algorithm: The GPM comparison with the bond energy method. Control Cybern. 15, 155–165 (1986)

    MathSciNet  MATH  Google Scholar 

  • George, T.: A scalable collaborative filtering framework based on co-clustering. In: ICDM, pp. 625–628 (2005)

  • Girolami, M.: The topographic organization and visualization of binary data using multivariate-Bernoulli latent variable models. IEEE Trans. Neural Netw. 12(6), 1367–1374 (2001)

    Article  Google Scholar 

  • Govaert, G.: Classification croisée. Thèse d’état, Université Paris 6, France (1983)

  • Govaert, G.: Simultaneous clustering of rows and columns. Control Cybern. 24(4), 437–458 (1995)

    MATH  Google Scholar 

  • Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recognit. 36, 463–473 (2003)

    Article  MATH  Google Scholar 

  • Govaert, G., Nadif, M.: An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 643–647 (2005)

  • Govaert, G., Nadif, M.: Block bernoulli parsimonious clustering models. Selected Contributions in Data Analysis and Classification, Studies in Classification, Data Analysis, and Knowledge Organization, pp. 203–212. Springer, Berlin (2007)

  • Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theory Methods 39(3), 416–425 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Govaert, G., Nadif, M.: Co-Clustering. Wiley (2013)

  • Gupta, J., Singh, S., Verma, NK.: MTBA: Matlab toolbox for biclustering analysis. IEEE, pp. 94–97 (2013)

  • Hofmann, T., Puzicha, J.: Latent class models for collaborative filtering. IJCAI, pp. 688–693. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (2013)

    Article  MATH  Google Scholar 

  • Kabán, A., Bingham, E.: Factorisation and denoising of 0–1 data: a variational approach. Neurocomputing 71(10), 2291–2308 (2008)

    Article  Google Scholar 

  • Keribin, C., Brault, V., Celeux, G., Govaert, G.: Estimation and selection for the latent block model on categorical data. Stat. Comput. 25(6), 1201–1216 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Labiod, L., Nadif, M.: Co-clustering for binary and categorical data with maximum modularity. In: ICDM, pp. 1140–1145 (2011)

  • Lee, S., Huang, J.: A biclustering algorithm for binary matrices based on penalized Bernoulli likelihood. Stat. Comput. 24(3), 429–441 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, T.: A general model for clustering binary data. In: ACM SIGKDD, ACM, pp. 188–197 (2005)

  • Lomet, A.: Sélection de modèle pour la classification croisée de données continues (2012)

  • Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans. Comput. Biol. Bioinform. 1(1), 24–45 (2004)

    Article  Google Scholar 

  • Marcotorchino, F.: Block seriation problems: a unified approach. Appl. Stoch. Models Data Anal. 3, 73–91 (1987)

    Article  MATH  Google Scholar 

  • Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)

    Article  Google Scholar 

  • Si, L., Jin, R.: Flexible mixture model for collaborative filtering. In: ICML, pp. 704–711 (2003)

  • Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)

    MathSciNet  MATH  Google Scholar 

  • Symons, M.J.: Clustering criteria and multivariate normal mixture. Biometrics 37, 35–43 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Van Dijk, B., Van Rosmalen, J., Paap, R.: A bayesian approach to two-mode clustering (2009)

  • Van Mechelen, I., Bock, H.H., De Boeck, P.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra et al., (eds). Springer, Heidelberg (2001)

  • Wyse, J., Friel, N.: Block clustering with collapsed latent block models. Stat. Comput. 22(2), 415–428 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work has been funded by AAP Sorbonne Paris Cité.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charlotte Laclau.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laclau, C., Nadif, M. Diagonal latent block model for binary data. Stat Comput 27, 1145–1163 (2017). https://doi.org/10.1007/s11222-016-9677-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9677-7

Keywords

Navigation