Abstract
Analyzing and understanding large and complex volumes of biological data is a challenging task as data becomes more widely available. In biomedical research, gene expression data are among the most commonly used biological data. Formal concept analysis frequently identifies deferentially expressed genes in microarray data. Top-K formal concepts are effective at producing effective Formal Concepts. To our knowledge, no existing algorithm can complete the difficult task of identifying only important biclusters. For this purpose, a new Top-K formal concepts-based algorithm for mining biclusters from gene expression data is proposed: Top-BicMiner. It extracts biclusters’ sets with positively and negatively correlated genes according to distinct correlation measures. The proposed method is applied to both synthetic and real-life microarray datasets. The experimental results highlight the Top-BicMiner’s ability to identify statistically and biologically significant biclusters.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs13042-023-01949-9/MediaObjects/13042_2023_1949_Fig13_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
links of data used in the experiments are given in the paper.
Code availability
Not applicable.
Notes
TOPSIS stands for Order of Preference by Similarity to Ideal Solution [36].
We use a separator-free abbreviated form for the sets, e.g., \(\{I_{1}I_{2}I_{3}\}\) stands for the set of items \(\{I_1, I_2, I_3\}\).
The extraction of the formal concepts is carried out through the invocation of the efficient LCM algorithm [72].
nb1 is the number of FCs extracted in Line 9 and nb2 is the number of FCs extracted in Line 10(i.e \(nb=max(nb1,nb2\))).
Available at https://github.com/mehdi-kaytoue/trimax.
Publicly available at http://arep.med.harvard.edu/biclustering/.
Publicly available at http://arep.med.harvard.edu/biclustering/.
Publicly available at http://www.tik.ethz.ch/sop/bimax/.
Available at http://llama.mshri.on.ca/funcassociate/.
The best biclusters have an adjusted p-value less than 0.001%.
Publicly available at https://www.yeastgenome.org/goTermFinder.
References
Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. BioData Min 2:9
Ayadi W, Elloumi M, Hao JK (2012) Bicfinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30(2):341–358
Ayadi W, Hao J (2014) A memetic algorithm for discovering negative correlation biclusters of DNA microarray data. Neurocomputing 145:14–22. https://doi.org/10.1016/j.neucom.2014.05.074
Barbut M, Monjardet B (1970) Ordre et classification: algèbre et combinatoire. Classiques Hachette. Hachette. https://books.google.fr/books?id=n3BpSgAACAAJ
Behera N, Sinha S (2022) Extracting the candidate genes for cancer from the microarray gene expression data by stochastic computation
Ben-Dor A, Chor B, Karp RM, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3/4):373–384
Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003
Besson J, Robardet C, Boulicaut J, Rome S (2005) Constraint-based concept mining and its application to microarray data analysis. Intell Data Anal 9(1):59–82
Bogdanović M, Gligorijević MF, Veljković N, Puflović D, Stoimenov L (2023) Cross-portal metadata alignment-connecting open data portals through means of formal concept analysis. Inf Sci 118958
Bouasker S, Ben Yahia S, Diallo G (2019) An insight into biological datamining based on rarity and correlation as constraints. In: Hung C, Papadopoulos GA (eds) Proceedings of the 34th ACM/SIGAPP symposium on applied computing, SAC 2019, Limassol, Cyprus, April 8–12, 2019. ACM, pp 3–10. https://doi.org/10.1145/3297280.3297281
Bouasker S, Inoubli W, Yahia SB, Diallo G (2021) Pregnancy associated breast cancer gene expressions: new insights on their regulation based on rare correlated patterns. IEEE ACM Trans Comput Biol Bioinform 18(3):1035–1048. https://doi.org/10.1109/TCBB.2020.3015236
Burgos-Salcedo J (2021) A comparative analysis of clinical stage 3 covid-19 vaccines using knowledge representation. medRxiv 2021–03
Buzmakov A, Egho E, Jay N, Kuznetsov SO, Napoli A, Raïssi C (2016) On mining complex sequential data by means of fca and pattern structures. Int J Gen Syst 45(2):135–159. https://doi.org/10.1080/03081079.2015.1072925
Buzmakov A, Kuznetsov SO, Napoli A (2015) Fast generation of best interval patterns for nonmonotonic constraints. CoRR arxiv:abs/1506.01071
Madeira S, Oliveira LA (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1:24–45
Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of ISMB, UC San Diego, California, pp 93–103
Daniel PB, Werner D, Martin G (2003) Practical approach to microarray data analysis
Ganter B, Wille R (1999) Formal concept analysis—mathematical foundations. Springer, Berlin
Gasch AP, Eisen MB (2002) Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 3(11) (research0059.1). https://doi.org/10.1186/gb-2002-3-11-research0059
Ghosh M, Roy A, Mondal KC (2022) Fca-based constant and coherent-signed bicluster identification and its application in biodiversity study. In: Proceedings of international conference on advanced computing applications: ICACA 2021. Springer, pp 679–691
Hao F, Min G, Pei Z, Park DS, Yang LT (2015) \(k\)-clique community detection in social networks based on formal concept analysis. IEEE Syst J 11(1):250–259
Hao F, Park DS, Min G, Jeong YS, Park JH (2016) k-cliques mining in dynamic social networks based on triadic formal concept analysis. Neurocomputing 209:57–66
Hao F, Sun Y, Lin Y (2022) Rough maximal cliques enumeration in incomplete graphs based on partially-known concept learning. Neurocomputing 496:96–106
Henriques R, Antunes C, Madeira SC (2013) Methods for the efficient discovery of large item-indexable sequential patterns. In: New Frontiers in Mining Complex Patterns - Second International Workshop, NFMCP 2013, Held in Conjunction with ECML-PKDD 2013, Prague, Czech Republic, September 27, 2013, Revised Selected Papers, pp. 100–116. https://doi.org/10.1007/978-3-319-08407-7_7
Henriques R, Antunes C, Madeira SC (2015) A structured view on pattern mining-based biclustering. Pattern Recognit 48(12):3941–3958. https://doi.org/10.1016/j.patcog.2015.06.018
Henriques R, Ferreira FL, Madeira SC (2017) Bicpams: software for biological data analysis with pattern-based biclustering. BMC Bioinform 18(1):82:1–82:16. https://doi.org/10.1186/s12859-017-1493-3
Henriques R, Madeira SC (2014) Bicpam: pattern-based biclustering for biomedical data analysis. Algor Mol Biol 9:27. https://doi.org/10.1186/s13015-014-0027-z
Henriques R, Madeira SC (2014) Bicspam: flexible biclustering using sequential patterns. BMC Bioinform 15:130. https://doi.org/10.1186/1471-2105-15-130
Henriques R, Madeira SC (2016) Bic2pam: constraint-guided biclustering for biological data analysis with domain knowledge. Algor Mol Biol 11:23. https://doi.org/10.1186/s13015-016-0085-5
Henriques R, Madeira SC (2016) Bicnet: flexible module discovery in large-scale biological networks using biclustering. Algor Mol Biol 11:14. https://doi.org/10.1186/s13015-016-0074-8
Houari A, Ayadi W, Ben Yahia S (2015) Discovering low overlapping biclusters in gene expression data through generic association rules. In: Bellatreche L, Manolopoulos Y (eds) Model and data engineering—5th international conference, MEDI 2015, Rhodes, Greece, September 26–28, 2015, Proceedings, lecture notes in computer science, vol 9344. Springer, pp 139–153. https://doi.org/10.1007/978-3-319-23781-7_12
Houari A, Ayadi W, Ben Yahia S (2017) Mining negative correlation biclusters from gene expression data using generic association rules. In: Zanni-Merk C, Frydman CS, Toro C, Hicks Y, Howlett RJ, Jain LC (eds) Knowledge-based and intelligent information & engineering systems: Proceedings of the 21st international conference KES-2017, Marseille, France, 6–8 September 2017, Procedia computer science, vol 112. Elsevier, pp 278–287. https://doi.org/10.1016/j.procs.2017.08.262
Houari A, Ayadi W, Ben Yahia S (2018) NBF: an fca-based algorithm to identify negative correlation biclusters of DNA microarray data. In: Barolli L, Takizawa M, Enokido T, Ogiela MR, Ogiela L, Javaid N (eds) 32nd IEEE international conference on advanced information networking and applications, AINA 2018, Krakow, Poland, May 16–18, 2018. IEEE Computer Society, pp 1003–1010. https://doi.org/10.1109/AINA.2018.00146
Houari A, Ayadi W, Ben Yahia S (2018) A new fca-based method for identifying biclusters in gene expression data. Int J Mach Learn Cybern 9(11):1879–1893. https://doi.org/10.1007/s13042-018-0794-9
Houari A, Ben Yahia S (2021) Top-k formal concepts for identifying positively and negatively correlated biclusters. In: Attiogbé JC, Yahia SB (eds) Model and data engineering—10th international conference, MEDI 2021, Tallinn, Estonia, June 21–23, 2021, Proceedings, lecture notes in computer science, vol 12732. Springer, pp 156–172. https://doi.org/10.1007/978-3-030-78428-7_13
Hwang CL, Yoon K (1981) Methods for multiple attribute decision making. In: Multiple attribute decision making. Springer, pp 58–191
Ignatov DI, Khvorykh GV, Khrunin AV, Nikolić S, Shaban M, Petrova EA, Koltsova EA, Takelait F, Egurnov D (2021) Object-attribute biclustering for elimination of missing genotypes in ischemic stroke genome-wide data. In: Recent trends in analysis of images, social networks and texts: 9th international conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 revised supplementary Proceedings 9. Springer, pp 185–204
Iqbal N, Kumar P (2023) From data science to bioscience: emerging era of bioinformatics applications, tools and challenges. Procedia Comput Sci 218:1516–1528
Juniarta N (2019) Mining complex data and biclustering using formal concept analysis. Theses, Université de Lorraine. https://hal.inria.fr/tel-02426034
Juniarta N, Couceiro M, Napoli A (2019) A unified approach to biclustering based on formal concept analysis and interval pattern structure. In: Discovery science: 22nd international conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings 22. Springer, pp 51–60
Juniarta N, Couceiro M, Napoli A (2020) Order-preserving biclustering based on fca and pattern structures. In: Complex pattern mining
Kataria S, Batra U (2022) Co-clustering neighborhood?based collaborative filtering framework using formal concept analysis. Int J Inf Technol 14(4):1725–1731
Kaytoue M, Kuznetsov SO, Macko J, Napoli A (2014) Biclustering meets triadic concept analysis. Ann Math Artif Intell 70(1–2):55–79. https://doi.org/10.1007/s10472-013-9379-1
Kaytoue M, Kuznetsov SO, Napoli A (2011) Biclustering numerical data in formal concept analysis. In: Proceedings of ICFCA, Leuven, Belgium, pp 135–150
Kaytoue M, Kuznetsov SO, Napoli A, Duplessis S (2011) Mining gene expression data with pattern structures in formal concept analysis. Inf Sci 181(10):1989–2001. https://doi.org/10.1016/j.ins.2010.07.007
Kuznetsov SO (1996) Mathematical aspects of concept analysis. J Math Sci 80(2):1654–1698
Kuznetsov SO (2007) On stability of a formal concept. Ann Math Artif Intell 49(1–4):101–115
Kuznetsov SO (2013) Fitting pattern structures to knowledge discovery in big data. In: Formal concept analysis: 11th international conference, ICFCA 2013, Dresden, Germany, May 21–24, 2013. Proceedings 11. Springer, pp 254–266
Kuznetsov SO, Makhazhanov N, Ushakov M (2017) On neural network architecture based on concept lattices. In: Foundations of intelligent systems: 23rd international symposium, ISMIS 2017, Warsaw, Poland, June 26–29, 2017, Proceedings 23. Springer, pp 653–663
Lehmann F, Wille R (1995) A triadic approach to formal concept analysis. In: Conceptual structures: applications, implementation and theory, third international conference on conceptual structures, ICCS ’95, Santa Cruz, California, USA, August 14–18, 1995, Proceedings, pp 32–43. https://doi.org/10.1007/3-540-60161-9_27
Li G, Ma Q, Tang H, Paterson AH, Xu Y (2009) Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucl Acids Res gkp491
Li J, Liu Q, Zeng T (2010) Negative correlations in collaboration: concepts and algorithms. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, July 25–28, 2010, pp 463–472. https://doi.org/10.1145/1835804.1835864
Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4):474–482
Madeira SC, Oliveira AL (2009) A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algor Mol Biol. https://doi.org/10.1186/1748-7188-4-8
Madeira SC, Teixeira MC, Sá-Correia I, Oliveira AL (2010) Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans Comput Biol Bioinform 7(1):153–165. https://doi.org/10.1145/1719272.1719289
Mandal K, Sarmah R, Bhattacharyya DK (2020) Popbic: pathway-based order preserving biclustering algorithm towards the analysis of gene expression data. IEEE/ACM Trans Comput Biol Bioinform 18(6):2659–2670
Martínez R, Pasquier N, Pasquier C (2008) Genminer: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics 24(22):2643–2644. https://doi.org/10.1093/bioinformatics/btn490
Mondal KC, Pasquier N (2014) Galois closure based association rule mining from biological data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data, pp 761–802
Mouakher A, Ben Yahia S (2019) On the efficient stability computation for the selection of interesting formal concepts. Inf Sci 472:15–34
Mouakher A, Ko A (2022) Efficient assessment of formal concept stability in the galois lattice. Int J Gen Syst 51(8):791–821. https://doi.org/10.1080/03081079.2022.2084728
Murali T, Kasif S (2002) Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003. World Scientific, pp 77–88
Nepomuceno JA, Lora AT, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS (2015) Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Prog Biomed 119(3):163–180. https://doi.org/10.1016/j.cmpb.2015.02.010
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2015) Scatter search-based identification of local patterns with positive and negative correlations in gene expression data. Appl Soft Comput 35:637–651. https://doi.org/10.1016/j.asoc.2015.06.019
Odibat O, Reddy CK (2014) Efficient mining of discriminative co-clusters from gene expression data. Knowl Inf Syst 41(3):667–696. https://doi.org/10.1007/s10115-013-0684-0
Peddada S, Lobenhofer E, Li L, Afshari C, Weinberg C (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19:834–841
Pensa RG, Besson J, Boulicaut JF (2004) A methodology for biologically relevant pattern discovery from gene expression data. In: Proceedings of discovery science, pp 230–241
Prelic A, Bleuler1 S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129
Roscoe S, Khatri M, Voshall A, Batra S, Kaur S, Deogun J (2022) Formal concept analysis applications in bioinformatics. ACM Comput Surv 55(8):1–40
Roy S, Bhattacharyya DK, Kalita JK (2013) Cobi: pattern based co-regulated biclustering of gene expression data. Pattern Recognit Lett 34(14):1669–1678
Trabelsi C, Jelassi, N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining—16th Pacific–Asia conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29–June 1, 2012, Proceedings, Part II. Springer, pp 231–242. https://doi.org/10.1007/978-3-642-30220-6_20
Tu X, Wang Y, Zhang M, Wu J (2016) Using formal concept analysis to identify negative correlations in gene expression data. IEEE/ACM Trans Comput Biol Bioinform 13(2):380–391 https://doi.org/10.1109/TCBB.2015.2443805
Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery science, 7th international conference, DS 2004, Padova, Italy, October 2–5, 2004, Proceedings, pp 16–31. https://doi.org/10.1007/978-3-540-30214-8_2
Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, June 3–6, 2002, pp 394–405. https://doi.org/10.1145/564691.564737
Wei J, Wang S, Yuan X (2010) Ensemble rough hypercuboid approach for classifying cancers. IEEE Trans Knowl Data Eng 22(3):381–391. https://doi.org/10.1109/TKDE.2009.114
Zeng T, Li J (2010) Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucl Acids Res 38(1):e1
Zhao Y, Yu J, Wang G, Chen L, Wang B, Yu G (2008) Maximal subspace coregulated gene clustering. Knowl Data Eng IEEE Trans 20(1):83–98. https://doi.org/10.1109/TKDE.2007.190670
Zhou H, Lin W, Labra SR, Lipton SA, Schork NJ, Rangan AV (2022) Detecting Boolean asymmetric relationships with a loop counting technique and its implications for analyzing heterogeneity within gene expression datasets. bioRxiv 2022–08
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
AH contributed to the idea, algorithm, theoretical analysis, writing, and experiments. SB contributed to the theoretical analysis and writing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is an extended version of our work proposed in [35].
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Houari, A., Ben Yahia, S. A Top-K formal concepts-based algorithm for mining positive and negative correlation biclusters of DNA microarray data. Int. J. Mach. Learn. & Cyber. 15, 941–962 (2024). https://doi.org/10.1007/s13042-023-01949-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01949-9