Skip to main content

Advertisement

Log in

An improved combinatorial biclustering algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

DNA microarray analysis represents a relevant technology in genetic research to explore and recognize possible genomic features of many diseases. Since it is a high-throughput technology, it requires advanced tools for a dimensional reduction in massive data sets. Clustering is among the most appropriate tools for mining these data, although it suffers from the following problems: instability of the results, large number of genes compared with the number of samples, high noise level, complexity of initialization, and grouping genes and samples simultaneously. Almost all these problems can be positively addressed by using novel techniques, such as biclustering. In this paper, a new biclustering algorithm is proposed, hereafter denoted as combinatorial biclustering algorithm (CBA), that addresses the problems listed above. The algorithm analyzes the data finding biclusters of the desired size and allowable error. CBA performances are compared with the ones of other bicluster algorithms by discussing the output of different methods once running them on a synthetic data set. CBA seems to perform better, and for this reason, it has been applied to study a real data set as well. In particular, CBA has analyzed the transcriptional profile of 38 gastric cancer tissues with microsatellite instability (MSI) and without MSS. The results show clearly a much coherent behavior in gene expression of normal tissues versus tumoral ones. The high level of gene misregulation in tumoral tissues affects any further bicluster analysis, and it is only partially smoothed in the MSI/MSS study even admitting much higher level on initial admissible error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Ben-Dor A, Chor B, Karp R, Yakhini Z (2002) Discovering local structure in gene expression data: the order-preserving submatrix problem. In: Proceedings of the sixth international conference on computational biology, Washington, DC, USA, ACM, pp 89–100

  2. Bergmann S, Ihmels J, Barkai N (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys 67(3 Pt 1):41–48

    Google Scholar 

  3. Bhattacharya A, De RK (2009) Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics 25(21):2795–801

    Article  Google Scholar 

  4. Cheng Y, Church G (2000) Biclustering of expression data. In: Press A (ed) Proceeding of the Eighth International Conference Intelligent systems for molecular biology (ISMB 00), pp 93–103

  5. D’Errico M, de Rinaldis E, Blasi M, Viti V, Falchetti M, Calcagnile A, Sera F, Saieva C, Ottini L, Palli D, Palombo F, Giuliani A, Dogliotti E (2009) Genome-wide expression profile of sporadic gastric cancers with microsatellite instability. Eur J Cancer 3(45):461–469

    Article  Google Scholar 

  6. Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. PNAS 97(22):12,079–12,084

    Article  Google Scholar 

  7. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129

    Article  Google Scholar 

  8. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357

    Article  Google Scholar 

  9. Kluger Y, Basri R, Chang J, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13:703–716

    Article  Google Scholar 

  10. Lazzeroni L, Owen A (2000) Plaid models for gene expression data. Technical report, Stanford Univ

  11. Milne AN, Carneiro F, O’Morain C, Offerhaus GJ (2009) Nature meets nurture: molecular genetics of gastric cancer. Hum Genet 126:615–628

    Article  Google Scholar 

  12. Mirkin B (1996) Mathematical classification and clustering. Kluwer, Boston

    Book  MATH  Google Scholar 

  13. Nosova E, Raiconi G, Tagliaferri R (2011) A multi-biclustering combinatorial based algorithm. In: Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2011), IEEE Catalog Number: CFP11IDM-CDR ISBN: 978-1-4244-9925-0

  14. Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006) A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9):1122–1129

    Article  Google Scholar 

  15. Reiss D, Baliga N, Bonneau R (2006) Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinform 2(7):280–302

    Article  Google Scholar 

  16. Tanay A, Sharan R, Kupiec M, Shamir R (2004) Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. In: PNAS (ed) Proceedings of the National Academic Science USA, vol 101, pp 2981–2986

  17. Tang C, Zhang L, Ramanathan M, Zhang A (2001) Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: I.C. Society (ed) Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering (BIBE’01), pp 41–48

  18. Tchagang AB, Tewfik A (2006) Dna microarray data analysis: a novel biclustering algorithm approach. EURASIP J Appl Signal Process 1:60–60

    Google Scholar 

  19. Wang HX (2002) Clustering by pattern similarity: the pcluster algorithm. http://wis.cs.ucla.edu/hxwang/proj/delta.html

  20. Yang J, Wang W, Wang H, Yu P (2003) Enhanced biclustering on expression data. In: I.C. Society (ed) Proceedings of the Third IEEE Conference Bioinformatics and Bioengineering, pp 321–327

  21. Yang J, Wang W, Wang H, Yu PS (2002) Delta-clusters: capturing subspace correlation in a large data set. In: I.C.S. Press (ed) Proceedings of the IEEE International Conference on Data Engineering (ICDE), Los Alamitos, pp 517–528

Download references

Acknowledgments

This work is supported by Istituto Nazionale di Alta Matematica Francesco Severi (INdAM) with the scholarship N U 2007/000458 07/09/2007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Napolitano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nosova, E., Napolitano, F., Amato, R. et al. An improved combinatorial biclustering algorithm. Neural Comput & Applic 22 (Suppl 1), 293–302 (2013). https://doi.org/10.1007/s00521-012-0902-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-0902-9

Keywords

Navigation