Skip to main content
Log in

A Top-K formal concepts-based algorithm for mining positive and negative correlation biclusters of DNA microarray data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Analyzing and understanding large and complex volumes of biological data is a challenging task as data becomes more widely available. In biomedical research, gene expression data are among the most commonly used biological data. Formal concept analysis frequently identifies deferentially expressed genes in microarray data. Top-K formal concepts are effective at producing effective Formal Concepts. To our knowledge, no existing algorithm can complete the difficult task of identifying only important biclusters. For this purpose, a new Top-K formal concepts-based algorithm for mining biclusters from gene expression data is proposed: Top-BicMiner. It extracts biclusters’ sets with positively and negatively correlated genes according to distinct correlation measures. The proposed method is applied to both synthetic and real-life microarray datasets. The experimental results highlight the Top-BicMiner’s ability to identify statistically and biologically significant biclusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and material

links of data used in the experiments are given in the paper.

Code availability

Not applicable.

Notes

  1. TOPSIS stands for Order of Preference by Similarity to Ideal Solution [36].

  2. We use a separator-free abbreviated form for the sets, e.g., \(\{I_{1}I_{2}I_{3}\}\) stands for the set of items \(\{I_1, I_2, I_3\}\).

  3. The extraction of the formal concepts is carried out through the invocation of the efficient LCM algorithm [72].

  4. nb1 is the number of FCs extracted in Line 9 and nb2 is the number of FCs extracted in Line 10(i.e \(nb=max(nb1,nb2\))).

  5. Available at https://github.com/mehdi-kaytoue/trimax.

  6. Publicly available at http://arep.med.harvard.edu/biclustering/.

  7. Publicly available at http://arep.med.harvard.edu/biclustering/.

  8. Publicly available at http://www.tik.ethz.ch/sop/bimax/.

  9. Available at http://llama.mshri.on.ca/funcassociate/.

  10. The best biclusters have an adjusted p-value less than 0.001%.

  11. Publicly available at https://www.yeastgenome.org/goTermFinder.

  12. http://geneontology.org/.

References

  1. Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. BioData Min 2:9

    Article  PubMed  PubMed Central  Google Scholar 

  2. Ayadi W, Elloumi M, Hao JK (2012) Bicfinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30(2):341–358

    Article  Google Scholar 

  3. Ayadi W, Hao J (2014) A memetic algorithm for discovering negative correlation biclusters of DNA microarray data. Neurocomputing 145:14–22. https://doi.org/10.1016/j.neucom.2014.05.074

    Article  Google Scholar 

  4. Barbut M, Monjardet B (1970) Ordre et classification: algèbre et combinatoire. Classiques Hachette. Hachette. https://books.google.fr/books?id=n3BpSgAACAAJ

  5. Behera N, Sinha S (2022) Extracting the candidate genes for cancer from the microarray gene expression data by stochastic computation

  6. Ben-Dor A, Chor B, Karp RM, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3/4):373–384

    Article  CAS  PubMed  Google Scholar 

  7. Bergmann S, Ihmels J, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20(13):1993–2003

    Article  PubMed  Google Scholar 

  8. Besson J, Robardet C, Boulicaut J, Rome S (2005) Constraint-based concept mining and its application to microarray data analysis. Intell Data Anal 9(1):59–82

    Article  Google Scholar 

  9. Bogdanović M, Gligorijević MF, Veljković N, Puflović D, Stoimenov L (2023) Cross-portal metadata alignment-connecting open data portals through means of formal concept analysis. Inf Sci 118958

  10. Bouasker S, Ben Yahia S, Diallo G (2019) An insight into biological datamining based on rarity and correlation as constraints. In: Hung C, Papadopoulos GA (eds) Proceedings of the 34th ACM/SIGAPP symposium on applied computing, SAC 2019, Limassol, Cyprus, April 8–12, 2019. ACM, pp 3–10. https://doi.org/10.1145/3297280.3297281

  11. Bouasker S, Inoubli W, Yahia SB, Diallo G (2021) Pregnancy associated breast cancer gene expressions: new insights on their regulation based on rare correlated patterns. IEEE ACM Trans Comput Biol Bioinform 18(3):1035–1048. https://doi.org/10.1109/TCBB.2020.3015236

    Article  PubMed  Google Scholar 

  12. Burgos-Salcedo J (2021) A comparative analysis of clinical stage 3 covid-19 vaccines using knowledge representation. medRxiv 2021–03

  13. Buzmakov A, Egho E, Jay N, Kuznetsov SO, Napoli A, Raïssi C (2016) On mining complex sequential data by means of fca and pattern structures. Int J Gen Syst 45(2):135–159. https://doi.org/10.1080/03081079.2015.1072925

    Article  MathSciNet  Google Scholar 

  14. Buzmakov A, Kuznetsov SO, Napoli A (2015) Fast generation of best interval patterns for nonmonotonic constraints. CoRR arxiv:abs/1506.01071

  15. Madeira S, Oliveira LA (2004) Biclustering algorithms for biological data analysis: a survey. IEEE Trans Comput Biol Bioinform 1:24–45

  16. Cheng Y, Church GM (2000) Biclustering of expression data. In: Proceedings of ISMB, UC San Diego, California, pp 93–103

  17. Daniel PB, Werner D, Martin G (2003) Practical approach to microarray data analysis

  18. Ganter B, Wille R (1999) Formal concept analysis—mathematical foundations. Springer, Berlin

  19. Gasch AP, Eisen MB (2002) Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 3(11) (research0059.1). https://doi.org/10.1186/gb-2002-3-11-research0059

  20. Ghosh M, Roy A, Mondal KC (2022) Fca-based constant and coherent-signed bicluster identification and its application in biodiversity study. In: Proceedings of international conference on advanced computing applications: ICACA 2021. Springer, pp 679–691

  21. Hao F, Min G, Pei Z, Park DS, Yang LT (2015) \(k\)-clique community detection in social networks based on formal concept analysis. IEEE Syst J 11(1):250–259

    Article  ADS  Google Scholar 

  22. Hao F, Park DS, Min G, Jeong YS, Park JH (2016) k-cliques mining in dynamic social networks based on triadic formal concept analysis. Neurocomputing 209:57–66

    Article  Google Scholar 

  23. Hao F, Sun Y, Lin Y (2022) Rough maximal cliques enumeration in incomplete graphs based on partially-known concept learning. Neurocomputing 496:96–106

    Article  Google Scholar 

  24. Henriques R, Antunes C, Madeira SC (2013) Methods for the efficient discovery of large item-indexable sequential patterns. In: New Frontiers in Mining Complex Patterns - Second International Workshop, NFMCP 2013, Held in Conjunction with ECML-PKDD 2013, Prague, Czech Republic, September 27, 2013, Revised Selected Papers, pp. 100–116. https://doi.org/10.1007/978-3-319-08407-7_7

  25. Henriques R, Antunes C, Madeira SC (2015) A structured view on pattern mining-based biclustering. Pattern Recognit 48(12):3941–3958. https://doi.org/10.1016/j.patcog.2015.06.018

    Article  ADS  Google Scholar 

  26. Henriques R, Ferreira FL, Madeira SC (2017) Bicpams: software for biological data analysis with pattern-based biclustering. BMC Bioinform 18(1):82:1–82:16. https://doi.org/10.1186/s12859-017-1493-3

  27. Henriques R, Madeira SC (2014) Bicpam: pattern-based biclustering for biomedical data analysis. Algor Mol Biol 9:27. https://doi.org/10.1186/s13015-014-0027-z

    Article  CAS  Google Scholar 

  28. Henriques R, Madeira SC (2014) Bicspam: flexible biclustering using sequential patterns. BMC Bioinform 15:130. https://doi.org/10.1186/1471-2105-15-130

    Article  Google Scholar 

  29. Henriques R, Madeira SC (2016) Bic2pam: constraint-guided biclustering for biological data analysis with domain knowledge. Algor Mol Biol 11:23. https://doi.org/10.1186/s13015-016-0085-5

    Article  Google Scholar 

  30. Henriques R, Madeira SC (2016) Bicnet: flexible module discovery in large-scale biological networks using biclustering. Algor Mol Biol 11:14. https://doi.org/10.1186/s13015-016-0074-8

    Article  Google Scholar 

  31. Houari A, Ayadi W, Ben Yahia S (2015) Discovering low overlapping biclusters in gene expression data through generic association rules. In: Bellatreche L, Manolopoulos Y (eds) Model and data engineering—5th international conference, MEDI 2015, Rhodes, Greece, September 26–28, 2015, Proceedings, lecture notes in computer science, vol 9344. Springer, pp 139–153. https://doi.org/10.1007/978-3-319-23781-7_12

  32. Houari A, Ayadi W, Ben Yahia S (2017) Mining negative correlation biclusters from gene expression data using generic association rules. In: Zanni-Merk C, Frydman CS, Toro C, Hicks Y, Howlett RJ, Jain LC (eds) Knowledge-based and intelligent information & engineering systems: Proceedings of the 21st international conference KES-2017, Marseille, France, 6–8 September 2017, Procedia computer science, vol 112. Elsevier, pp 278–287. https://doi.org/10.1016/j.procs.2017.08.262

  33. Houari A, Ayadi W, Ben Yahia S (2018) NBF: an fca-based algorithm to identify negative correlation biclusters of DNA microarray data. In: Barolli L, Takizawa M, Enokido T, Ogiela MR, Ogiela L, Javaid N (eds) 32nd IEEE international conference on advanced information networking and applications, AINA 2018, Krakow, Poland, May 16–18, 2018. IEEE Computer Society, pp 1003–1010. https://doi.org/10.1109/AINA.2018.00146

  34. Houari A, Ayadi W, Ben Yahia S (2018) A new fca-based method for identifying biclusters in gene expression data. Int J Mach Learn Cybern 9(11):1879–1893. https://doi.org/10.1007/s13042-018-0794-9

    Article  Google Scholar 

  35. Houari A, Ben Yahia S (2021) Top-k formal concepts for identifying positively and negatively correlated biclusters. In: Attiogbé JC, Yahia SB (eds) Model and data engineering—10th international conference, MEDI 2021, Tallinn, Estonia, June 21–23, 2021, Proceedings, lecture notes in computer science, vol 12732. Springer, pp 156–172. https://doi.org/10.1007/978-3-030-78428-7_13

  36. Hwang CL, Yoon K (1981) Methods for multiple attribute decision making. In: Multiple attribute decision making. Springer, pp 58–191

  37. Ignatov DI, Khvorykh GV, Khrunin AV, Nikolić S, Shaban M, Petrova EA, Koltsova EA, Takelait F, Egurnov D (2021) Object-attribute biclustering for elimination of missing genotypes in ischemic stroke genome-wide data. In: Recent trends in analysis of images, social networks and texts: 9th international conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 revised supplementary Proceedings 9. Springer, pp 185–204

  38. Iqbal N, Kumar P (2023) From data science to bioscience: emerging era of bioinformatics applications, tools and challenges. Procedia Comput Sci 218:1516–1528

    Article  Google Scholar 

  39. Juniarta N (2019) Mining complex data and biclustering using formal concept analysis. Theses, Université de Lorraine. https://hal.inria.fr/tel-02426034

  40. Juniarta N, Couceiro M, Napoli A (2019) A unified approach to biclustering based on formal concept analysis and interval pattern structure. In: Discovery science: 22nd international conference, DS 2019, Split, Croatia, October 28–30, 2019, Proceedings 22. Springer, pp 51–60

  41. Juniarta N, Couceiro M, Napoli A (2020) Order-preserving biclustering based on fca and pattern structures. In: Complex pattern mining

  42. Kataria S, Batra U (2022) Co-clustering neighborhood?based collaborative filtering framework using formal concept analysis. Int J Inf Technol 14(4):1725–1731

    Google Scholar 

  43. Kaytoue M, Kuznetsov SO, Macko J, Napoli A (2014) Biclustering meets triadic concept analysis. Ann Math Artif Intell 70(1–2):55–79. https://doi.org/10.1007/s10472-013-9379-1

    Article  MathSciNet  Google Scholar 

  44. Kaytoue M, Kuznetsov SO, Napoli A (2011) Biclustering numerical data in formal concept analysis. In: Proceedings of ICFCA, Leuven, Belgium, pp 135–150

  45. Kaytoue M, Kuznetsov SO, Napoli A, Duplessis S (2011) Mining gene expression data with pattern structures in formal concept analysis. Inf Sci 181(10):1989–2001. https://doi.org/10.1016/j.ins.2010.07.007

    Article  MathSciNet  Google Scholar 

  46. Kuznetsov SO (1996) Mathematical aspects of concept analysis. J Math Sci 80(2):1654–1698

    Article  MathSciNet  Google Scholar 

  47. Kuznetsov SO (2007) On stability of a formal concept. Ann Math Artif Intell 49(1–4):101–115

    Article  MathSciNet  Google Scholar 

  48. Kuznetsov SO (2013) Fitting pattern structures to knowledge discovery in big data. In: Formal concept analysis: 11th international conference, ICFCA 2013, Dresden, Germany, May 21–24, 2013. Proceedings 11. Springer, pp 254–266

  49. Kuznetsov SO, Makhazhanov N, Ushakov M (2017) On neural network architecture based on concept lattices. In: Foundations of intelligent systems: 23rd international symposium, ISMIS 2017, Warsaw, Poland, June 26–29, 2017, Proceedings 23. Springer, pp 653–663

  50. Lehmann F, Wille R (1995) A triadic approach to formal concept analysis. In: Conceptual structures: applications, implementation and theory, third international conference on conceptual structures, ICCS ’95, Santa Cruz, California, USA, August 14–18, 1995, Proceedings, pp 32–43. https://doi.org/10.1007/3-540-60161-9_27

  51. Li G, Ma Q, Tang H, Paterson AH, Xu Y (2009) Qubic: a qualitative biclustering algorithm for analyses of gene expression data. Nucl Acids Res gkp491

  52. Li J, Liu Q, Zeng T (2010) Negative correlations in collaboration: concepts and algorithms. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, July 25–28, 2010, pp 463–472. https://doi.org/10.1145/1835804.1835864

  53. Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4):474–482

    Article  CAS  PubMed  Google Scholar 

  54. Madeira SC, Oliveira AL (2009) A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algor Mol Biol. https://doi.org/10.1186/1748-7188-4-8

  55. Madeira SC, Teixeira MC, Sá-Correia I, Oliveira AL (2010) Identification of regulatory modules in time series gene expression data using a linear time biclustering algorithm. IEEE/ACM Trans Comput Biol Bioinform 7(1):153–165. https://doi.org/10.1145/1719272.1719289

    Article  CAS  PubMed  Google Scholar 

  56. Mandal K, Sarmah R, Bhattacharyya DK (2020) Popbic: pathway-based order preserving biclustering algorithm towards the analysis of gene expression data. IEEE/ACM Trans Comput Biol Bioinform 18(6):2659–2670

    Article  Google Scholar 

  57. Martínez R, Pasquier N, Pasquier C (2008) Genminer: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics 24(22):2643–2644. https://doi.org/10.1093/bioinformatics/btn490

    Article  CAS  PubMed  Google Scholar 

  58. Mondal KC, Pasquier N (2014) Galois closure based association rule mining from biological data. In: Elloumi M, Zomaya AY (eds) Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data, pp 761–802

  59. Mouakher A, Ben Yahia S (2019) On the efficient stability computation for the selection of interesting formal concepts. Inf Sci 472:15–34

    Article  Google Scholar 

  60. Mouakher A, Ko A (2022) Efficient assessment of formal concept stability in the galois lattice. Int J Gen Syst 51(8):791–821. https://doi.org/10.1080/03081079.2022.2084728

    Article  MathSciNet  Google Scholar 

  61. Murali T, Kasif S (2002) Extracting conserved gene expression motifs from gene expression data. In: Biocomputing 2003. World Scientific, pp 77–88

  62. Nepomuceno JA, Lora AT, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS (2015) Integrating biological knowledge based on functional annotations for biclustering of gene expression data. Comput Methods Prog Biomed 119(3):163–180. https://doi.org/10.1016/j.cmpb.2015.02.010

    Article  Google Scholar 

  63. Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS (2015) Scatter search-based identification of local patterns with positive and negative correlations in gene expression data. Appl Soft Comput 35:637–651. https://doi.org/10.1016/j.asoc.2015.06.019

    Article  Google Scholar 

  64. Odibat O, Reddy CK (2014) Efficient mining of discriminative co-clusters from gene expression data. Knowl Inf Syst 41(3):667–696. https://doi.org/10.1007/s10115-013-0684-0

    Article  PubMed  Google Scholar 

  65. Peddada S, Lobenhofer E, Li L, Afshari C, Weinberg C (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19:834–841

    Article  CAS  PubMed  Google Scholar 

  66. Pensa RG, Besson J, Boulicaut JF (2004) A methodology for biologically relevant pattern discovery from gene expression data. In: Proceedings of discovery science, pp 230–241

  67. Prelic A, Bleuler1 S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129

  68. Roscoe S, Khatri M, Voshall A, Batra S, Kaur S, Deogun J (2022) Formal concept analysis applications in bioinformatics. ACM Comput Surv 55(8):1–40

    Article  Google Scholar 

  69. Roy S, Bhattacharyya DK, Kalita JK (2013) Cobi: pattern based co-regulated biclustering of gene expression data. Pattern Recognit Lett 34(14):1669–1678

    Article  ADS  Google Scholar 

  70. Trabelsi C, Jelassi, N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining—16th Pacific–Asia conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29–June 1, 2012, Proceedings, Part II. Springer, pp 231–242. https://doi.org/10.1007/978-3-642-30220-6_20

  71. Tu X, Wang Y, Zhang M, Wu J (2016) Using formal concept analysis to identify negative correlations in gene expression data. IEEE/ACM Trans Comput Biol Bioinform 13(2):380–391 https://doi.org/10.1109/TCBB.2015.2443805

  72. Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery science, 7th international conference, DS 2004, Padova, Italy, October 2–5, 2004, Proceedings, pp 16–31. https://doi.org/10.1007/978-3-540-30214-8_2

  73. Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, Madison, Wisconsin, June 3–6, 2002, pp 394–405. https://doi.org/10.1145/564691.564737

  74. Wei J, Wang S, Yuan X (2010) Ensemble rough hypercuboid approach for classifying cancers. IEEE Trans Knowl Data Eng 22(3):381–391. https://doi.org/10.1109/TKDE.2009.114

    Article  Google Scholar 

  75. Zeng T, Li J (2010) Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways. Nucl Acids Res 38(1):e1

    Article  ADS  PubMed  Google Scholar 

  76. Zhao Y, Yu J, Wang G, Chen L, Wang B, Yu G (2008) Maximal subspace coregulated gene clustering. Knowl Data Eng IEEE Trans 20(1):83–98. https://doi.org/10.1109/TKDE.2007.190670

    Article  Google Scholar 

  77. Zhou H, Lin W, Labra SR, Lipton SA, Schork NJ, Rangan AV (2022) Detecting Boolean asymmetric relationships with a loop counting technique and its implications for analyzing heterogeneity within gene expression datasets. bioRxiv 2022–08

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

AH contributed to the idea, algorithm, theoretical analysis, writing, and experiments. SB contributed to the theoretical analysis and writing.

Corresponding author

Correspondence to Amina Houari.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of our work proposed in [35].

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 36 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Houari, A., Ben Yahia, S. A Top-K formal concepts-based algorithm for mining positive and negative correlation biclusters of DNA microarray data. Int. J. Mach. Learn. & Cyber. 15, 941–962 (2024). https://doi.org/10.1007/s13042-023-01949-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01949-9

Keywords

Navigation