Abstract
Biclustering is a two-dimensional data analysis technique, where submatrices of a given data matrix are looked for. Its extension into three-dimensional data is called triclustering. In the paper, a new generalized look into n-dimensional binary data n-clustering is presented. The searching is performed in terms of the Boolean reasoning paradigm, where the original case (the data) is coded into the Boolean formula and its prime implicants are equivalent to the solutions of the original issue. The correctness (finding n-clusters containing only 0s or 1s) and maximality (the n-cluster cannot be expanded in any dimension without the correctness requirement violation) of such an approach have strong mathematical foundations. The paper also shows the application of Boolean reasoning-based n-clustering for small three- and four-dimensional artificial data as well as for some biomedical ones.
Similar content being viewed by others
References
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129. https://doi.org/10.1080/01621459.1972.10481214
Krolak-Schwerdt S, Orlik P, Ganter B (1994) Information Systems and Data Analysis, In: Bock HH, Lenski W, Richter MM (eds) Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 298–307. https://doi.org/10.1007/978-3-642-46808-7_27
Lehmann F, Wille R (1995) Conceptual Structures: Applications, Implementation and Theory, In: Ellis G, Levinson R, Rich W, Sowa JF (eds), Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 32–43. https://doi.org/10.1007/3-540-60161-9_27
Mishra S, Vipsita S (2017) In 2017 14th IEEE India Council International Conference (INDICON), pp. 1–6. https://doi.org/10.1109/INDICON.2017.8488107
Mahanta P, Ahmed HA, Bhattacharyya DK, Kalita JK (2011) In 2011 2nd National Conference on Emerging Trends and Applications in Computer Science, pp. 1–6. https://doi.org/10.1109/NCETACS.2011.5751409
Tang J, Shu X, Qi G, Li Z, Wang M, Yan S, Jain R (2017) Tri-clustered tensor completion for social-aware image tag refinement. IEEE Trans Pattern Anal Mach Intell 39(8):1662–1674. https://doi.org/10.1109/TPAMI.2016.2608882
Michalak M, Ślȩzak D (2018) Boolean representation for exact biclustering. Fund Inform 161(3):275–297. https://doi.org/10.3233/FI-2018-1703
Michalak M, Jaksik P, Ślȩzak D (2020) Heuristic search of exact biclusters in binary data. Int J Appl Math Comput Sci 30(1):161–171
Michalak M, Ślȩzak D (2019) On Boolean representation of continuous data biclustering. Fund Inform 167(3):193–217. https://doi.org/10.3233/FI-2019-1814
Michalak M (2020) Induction of centre-based biclusters in terms of Boolean reasoning. Adv Intell Syst Comput 1061:239–248. https://doi.org/10.1007/978-3-030-31964-9_23
MacQueen JB (1967) In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Cam LML , Neyman J (eds), University of California Press, pp. 281–297
Steinhaus H (1957) Sur la division des corps matériels en parties. Bull Acad Pol Sci Cl III 4:801–804
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer-Verlag, US
Ester M, Kriegel HP, Sander J, Xu X (1996) (AAAI Press), KDD‘96, pp. 226–231
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69. https://doi.org/10.1007/BF00337288
Tanay A, Sharan R, Shamir R (2005) Handbook of Computational Molecular Biology (Chapman & Hall, CRC Press, Chap. A Survey, Biclustering Algorithms
Latkowski R (2003) On decomposition for incomplete data. Fund Inform 54:1–16
Chagoyen M, Carmona-Saez P, Shatkay H, Carazo JM, Pascual-Montano A (2006) Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinf. https://doi.org/10.1186/1471-2105-7-41
Orzechowski P, Boryczko K (2016) In Proceedings of the 15th International Conference on Artificial Intelligence and Soft Computing (Springer International Publishing), pp. 102–113. https://doi.org/10.1007/978-3-319-39384-1_9
Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Computers Oper Res 35(9):2964–2987. https://doi.org/10.1016/j.cor.2007.01.005
Pontes B, Giráldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inform 57:163–180
Ignatov DI, Watson BW (2016) In Russian and South African Workshop on Knowledge Discovery Techniques Based on Formal Concept Analysis, vol. 1522, pp. 23–39
Serin A, Vingron M (2011) DeBi: Discovering differentially expressed biclusters using a frequent itemset approach. Algorithms Mole Biol. https://doi.org/10.1186/1748-7188-6-18
Henriques R, Madeira SC (2018) Triclustering algorithms for three-dimensional data analysis: a comprehensive survey. ACM Comput Surv. https://doi.org/10.1145/3195833
Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E (2012) In Algorithms in Bioinformatics, Raphael B, Tang J (eds), Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 165–177. https://doi.org/10.1007/978-3-642-33122-0_13
Dede D, Oğul H (2013) In 2013 IEEE INISTA, pp. 1–5. https://doi.org/10.1109/INISTA.2013.6577644
Dede D, Oğul H (2014) Triclust: A tool for cross-species analysis of gene regulation. Mol Inf 33(5):382–387. https://doi.org/10.1002/minf.201400007
Sim K, Aung Z, Gopalkrishnan V (2010) In 2010 IEEE International Conference on Data Mining, pp. 471–480. https://doi.org/10.1109/ICDM.2010.19
Xu X, Lu Y, Tan K, Tung AKH (2009) In 2009 IEEE 25th International Conference on Data Engineering, pp. 445–456. https://doi.org/10.1109/ICDE.2009.80
Gerber GK, Dowell RD, Jaakkola TS, Gifford DK (2007) Automated discovery of functional generality of human gene expression programs. PLoS Comput Biol 3(8):1–15. https://doi.org/10.1371/journal.pcbi.0030148
Guigourès R, Boullé M, Rossi F (2018) Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif 12(3):509–536. https://doi.org/10.1007/s11634-015-0218-6
Ignatov DI, Gnatyshak DV, Kuznetsov SO, Mirkin BG (2015) Triadic formal concept analysis and triclustering: searching for optimal patterns. Mach Learn 101(1):271–302. https://doi.org/10.1007/s10994-015-5487-y
Zhao L, Zaki MJ (2005) In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (Association for Computing Machinery, New York, NY, USA), SIGMOD ’05, pp. 694–705. https://doi.org/10.1145/1066157.1066236
Hu Z, Bhatnagar R (2010) In 2010 IEEE International Conference on Data Mining, pp. 236–245. https://doi.org/10.1109/ICDM.2010.77
Ji L, Tan KL, Tung AKH (2006) In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB Endowment), VLDB ’06, pp. 811–822
Liu Junwan, Li Zhoujun, Hu Xiaohua, Chen Yiming (2008) in 2008 IEEE International Conference on Granular Computing, pp. 442–447. https://doi.org/10.1109/GRC.2008.4664735
Gutierrez-Aviles D, Rubio-Escudero C (2014) in 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 30–37. https://doi.org/10.1109/BIBM.2014.6999244
Brown FM (1990) Boolean Reasoning. Springer, US
Pawlak Z, Skowron A (2007) Rough sets and boolean reasoning. Inf Sci 177(1):41–73
Stawicki S, Ślȩzak D, Janusz A, Widz S (2017) Decision bireducts and decision reducts—a comparison. Int J Approx Reason 84:75–109 https://doi.org/10.1016/j.ijar.2017.02.007. https://www.sciencedirect.com/science/article/pii/S0888613X17301408
Johnson D (1974) Approximation algorithms for combinational problems. J Comput Syst Sci 9:256–278. https://doi.org/10.1016/S0022-0000(74)80044-9
Cook SA (1971) In Proceedings of the Third Annual ACM Symposium on Theory of Computing (Association for Computing Machinery, New York, NY, USA), STOC ’71, pp. 151–158. https://doi.org/10.1145/800157.805047
Michalak M (2022) Hierarchical heuristics for Boolean–reasoning—based binary bicluster induction. Acta Informatica. https://doi.org/10.1007/s00236-021-00415-9
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Michalak, M. Theoretical backgrounds of Boolean reasoning-based binary n-clustering. Knowl Inf Syst 64, 2171–2188 (2022). https://doi.org/10.1007/s10115-022-01708-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01708-2