Skip to main content

Consensus Algorithm for Bi-clustering Analysis

  • Conference paper
  • First Online:
Computational Science – ICCS 2022 (ICCS 2022)

Abstract

Bi-clustering is an unsupervised data mining technique, which involves concurrent clustering of rows and columns of a two-dimensional data matrix. It has been demonstrated that bi-clustering may allow accurate and comprehensive mining of information, important for many practical applications. Numerous algorithms for data bi-clustering were proposed in the literature, based on different approaches and leading, in general, to different outputs. In this paper we propose a consensus method for combining outputs of many bi-clustering algorithms for improved quality of predictions. The proposed algorithm includes two steps. The first step, “assignment”, leads to detecting groups of bi-clusters of high similarity and the second step, “trimming”, results in transforming a group of similar bi-clusters into one bi-cluster of high quality. We demonstrate, on the basis of both simulated and real datasets, that using our algorithm highly improves quality of bi-clustering. We also provide an easy to use software tool, which includes implementations of several bi-clustering algorithms and our consensus method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguilar-Ruiz, J.S.: Shifting and scaling patterns from gene expression data. Bioinformatics 21(20), 3840–3845 (2005)

    Article  Google Scholar 

  2. Avidan, S.: Ensemble tracking. IEEE Trans. Patt. Anal. Mach. Intell. 29(2), 261–271 (2007)

    Article  Google Scholar 

  3. Ayadi, W., Elloumi, M., Hao, J.K.: A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. BioData Mining 2(1), 9 (2009)

    Article  Google Scholar 

  4. Benabdeslem, K., Allab, K.: Bi-clustering continuous data with self-organizing map. Neural Comput. Appl. 22(7–8), 1551–1562 (2013)

    Article  Google Scholar 

  5. Chagoyen, M., Carmona-Saez, P., Shatkay, H., Carazo, J.M., Pascual-Montano, A.: Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinf. 7(1), 1 (2006)

    Article  Google Scholar 

  6. Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)

    Google Scholar 

  7. Cohen, R., Katzir, L., Raz, D.: An efficient approximation for the generalized assignment problem. Inf. Process. Lett. 100(4), 162–166 (2006)

    Article  MathSciNet  Google Scholar 

  8. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)

    Google Scholar 

  9. Diaz, A.K.R., Peres, S.M.: Biclustering and coclustering: concepts, algorithms and viability for text mining. Revista de Informática Teórica e Aplicada 26(2), 81–117 (2019)

    Article  Google Scholar 

  10. Eren, K., Deveci, M., Küçüktunç, O., Çatalyürek, Ü.V.: A comparative analysis of biclustering algorithms for gene expression data. Brief. Bioinf. 14(3), 279–292 (2013)

    Article  Google Scholar 

  11. Fleischer, L., Goemans, M.X., Mirrokni, V.S., Sviridenko, M.: Tight approximation algorithms for maximum general assignment problems. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 611–620. Society for Industrial and Applied Mathematics (2006)

    Google Scholar 

  12. Hanczar, B., Nadif, M.: Ensemble methods for biclustering tasks. Pattern Recogn. 45(11), 3938–3949 (2012)

    Article  Google Scholar 

  13. Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)

    Article  Google Scholar 

  14. Gupta, J.K., Singh, S., Verma, N.K.: Mtba: matlab toolbox for biclustering analysis, pp. 94–97. IEEE (2013)

    Google Scholar 

  15. Kerr, G., Ruskin, H.J., Crane, M., Doolan, P.: Techniques for clustering gene expression data. Comput. Biol. Med. 38(3), 283–293 (2008)

    Article  Google Scholar 

  16. Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)

    Article  Google Scholar 

  17. Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica Sinica, 61–86 (2002)

    Google Scholar 

  18. Li, G., Ma, Q., Tang, H., Paterson, A.H., Xu, Y.: Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res., gkp491 (2009)

    Google Scholar 

  19. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 1(1), 24–45 (2004)

    Article  Google Scholar 

  20. Maind, A., Raut, S.: Comparative analysis and evaluation of biclustering algorithms for microarray data. In: Perez, G.M., Mishra, K.K., Tiwari, S., Trivedi, M.C. (eds.) Networking Communication and Data Knowledge Engineering. LNDECT, vol. 4, pp. 159–171. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-4600-1_15

    Chapter  Google Scholar 

  21. Mirkin, B.: Mathematical classification and clustering, volume 11 of nonconvex optimization and its applications (1996)

    Google Scholar 

  22. Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data, and a proposal. J. Am. Stat. Assoc. 58(302), 415–434 (1963)

    Article  Google Scholar 

  23. Moussaoui, S., et al.: On the decomposition of mars hyperspectral data by ICA and bayesian positive source separation. Neurocomputing 71(10), 2194–2208 (2008)

    Article  Google Scholar 

  24. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)

    Article  MathSciNet  Google Scholar 

  25. Murali, T., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003, pp. 77–88. World Scientific (2002)

    Google Scholar 

  26. Padilha, V.A., Campello, R.J.: A systematic comparative evaluation of biclustering techniques. BMC Bioinf. 18(1), 55 (2017)

    Article  Google Scholar 

  27. Pierskalla, W.P.: Letter to the editor-the multidimensional assignment problem. Oper. Res. 16(2), 422–431 (1968)

    Article  Google Scholar 

  28. Prelić, A.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  29. Rangan, A.V.: A simple filter for detecting low-rank submatrices. J. Comput. Phys. 231(7), 2682–2690 (2012)

    Article  MathSciNet  Google Scholar 

  30. Rodriguez-Baena, D.S., Perez-Pulido, A.J., Aguilar, J.S., et al.: A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics 27(19), 2738–2745 (2011)

    Article  Google Scholar 

  31. Teng, L., Chan, L.: Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. J. Signal Process. Syst. 50, 1520–1527 (2010)

    Google Scholar 

  32. Topchy, A., Minaei-Bidgoli, B., Jain, A.K., Punch, W.F.: Adaptive clustering ensembles. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 1, pp. 272–275. IEEE (2004)

    Google Scholar 

  33. Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  34. Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(03), 337–372 (2011)

    Article  MathSciNet  Google Scholar 

  35. Xie, J.: Qubic2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale rna-seq data. Bioinformatics 36(4), 1143–1149 (2020)

    Article  Google Scholar 

  36. Yang, J., Wang, H., Wang, W., Yu, P.S.: An improved biclustering method for analyzing gene expression profiles. Int. J. Artif. Intell. Tools 14(05), 771–789 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by Polish National Science Centre, OPUS grant 2016/21/B /ST6/02153 (AP,WL); research project (RAU-6, 2020) and projects for young scientists of the Silesian University of Technology (Gliwice, Poland) (PF,MS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paweł Foszner .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 909 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Foszner, P., Labaj, W., Polanski, A., Staniszewski, M. (2022). Consensus Algorithm for Bi-clustering Analysis. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13351. Springer, Cham. https://doi.org/10.1007/978-3-031-08754-7_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08754-7_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08753-0

  • Online ISBN: 978-3-031-08754-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics