Abstract
Many biclustering algorithms and bicluster criteria have been proposed in analyzing the gene expression data. However, there are no clues about the choice of a specific biclustering algorithm, which make ensemble biclustering method receive much attention for aggregating the advantage of various biclustering algorithms. Although the method of co-association consensus (COAC) is a landmark of ensemble biclustering, the effectiveness and efficiency are the worst in state-of-the-art methods. In this paper, to improve COAC, we propose spectral ensemble biclustering (SEB) in which an novel method for generating a set of basic biclusters is proposed for generating the basic biclusters with better quality as well as higher diversity and an new consensus method is also adopted for combing the above basic biclusters. In SEB, spectral clustering is directly applied to the co-association matrix and equivalently transformed into the weighted k-means. Experiments on six gene expression data demonstrate that the effectiveness, efficiency and scalability of SEB are the best compared with existing ensemble methods in terms of the biological significance and runtime.
Similar content being viewed by others
References
Hartigan JA (1972) Direct clustering of a data matrix]. J Am Stat Assoc 67(337):123–129
Cheng Y, Church GM (2000) Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 8:93–103
Maderia SC, Oliverial AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45
Pontes B, Giráldez R, Aguilar-Ruiz JS (2015) Biclustering on expression data: a review. J Biomed Inform 57(C):163–180
Ayadi W, Elloumi M, Hao JK (2009) A biclustering algorithm based on a bicluster enumeration tree: application to DNA microarray data. Biodata Min 2(2):146–150
Divina F, Aguilar-Ruiz JS (2006) Biclustering of expression data with evolutionary computation. IEEE Trans Knowl Data Eng 18(5):590–602
Nepomuceno JA, Troncoso A, Aguilarruiz JS (2011) Biclustering of gene expression data by correlation-based scatter search. Biodata Min 4(1):1–17
Liu J, Li Z, Hu X, Chen Y (2009) Biclustering of microarray data with MOPSO based on crowding distance. BMC Bioinform 10(9):S9
de Franca FO, Bezerra G, Von Zuben FJ (2006) New perspectives for the biclustering problem. IEEE Cong Evol Comput Vanc, BC, Canada, pp 753–760
Bryan K, Cunningham P, Bolshakova N (2006) Application of simulated annealing to the biclustering of gene expression data. IEEE Trans Inf Technol Biomed 10(3):519–525
Divina F, Pontes B, Giráldez R, Aguilarruiz JS (2012) An effective measure for assessing the quality of biclusters. Comput Biol Med 42(2):245–256
Ayadi W, Elloumi M, Hao JK (2012) BicFinder: a biclustering algorithm for microarray data analysis. Knowl Inf Syst 30(2):341–358
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2009) A novel coherence measure for discovering scaling biclusters from gene expression data. J Bioinform Comput Biol 7(5):853–868
Flores JL, Inza I, Larrañaga P, Calvo B (2013) A new measure for gene expression biclustering based on non-parametric correlation. Comput Methods Programs Biomed 112(3):367–397
Liu X, Wang L (2007) Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23(26):50–56
Hanczar B, Nadif M (2011) Using the bagging approach for biclustering of gene expression data. Neurocomputing 74(10):1595–1605
Hanczar B, Nadif M (2012) Ensemble methods for biclustering tasks. Pattern Recogn 45(11):3938–3949
Aggarwal G, Gupta N (2013) BiETopti-biclustering ensemble using optimization techniques. Advances in data mining: applications and theoretical aspects. Springer, Berlin, pp 181–192
Aggarwal G, Gupta N (2013) BEMI bicluster ensemble using mutual information. International conference on machine learning and applications, IEEE computer society, pp 321–324
Hanczar B, Nadif M (2014) Unsupervised consensus function applied to ensemble biclustering. In: Proceedings of the 3rd international conference on pattern recognition application and methods, pp 30–39
Liu H, Liu T, Wu J, Tao D, Yun F (2015) Spectral ensemble clustering. ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, NSW, Australia, pp 715–724
Yin L, Liu Y (2015) Biclustering of the gene expression data by coevolution cuckoo search. Int J Bioautom 19(2):161–176
Pontes B, Girldez R, Aguilarruiz JS (2014) Quality measures for gene expression biclusters. PLOS One 10(3):1–24
Henriques R, Madeira SC (2015) Biclustering with Flexible Plaid Models to Unravel Interactions between Biological Processes. IEEE/ACM Trans Comput Biol Bioinf 12(4):738–752
Chekouo T, Murua A (2015) The penalized biclustering model and related algorithms. J Appl Stat 42(6):1255–1277
Denitto M, Farinelli A, Figueiredo MAT (2016) A Biclustering Approach based on factor graphs and the max-sum algorithm. Pattern Recogn 62:114–124
Hussain SF, Ramazan M (2016) Biclustering of human cancer microarray data using co-similarity based co-clustering. Expert Syst Appl 55:520–531
Yang XS, Deb S (2009) Cuckoo Search via Lévy flights. In: Proceedings of world congress on nature & biologically inspired computing, India, pp 210–214
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(3):583–617
Zhao H, Weechung LAZ, Wang D, Yan H (2012) Biclustering analysis for pattern discovery: current techniques. Comp Stud Appl Curr Bioinform 7(1):43–55
Falcon S, Gentleman R (2007) How to use GOstats testing gene lists for go term association. Bioinformatics 23(2):257–258
Edgar R, Domrachev M, Alex EL (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30(1):207–210
Gautier L, Cope L, Bolstad BM (2004) Affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20(20):307–315
Kanehisa M (1997) A database for post-genome analysis. Trends Genet 13(13):375–376
Acknowledgements
This research was supported in part by the National Natural Science Foundation of China (NSFC) under grants 60903074 and the National High Technology Research and Development Program of China (863 Program) under grant 2008AA01Z119.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Yin, L., Liu, Y. Ensemble biclustering gene expression data based on the spectral clustering. Neural Comput & Applic 30, 2403–2416 (2018). https://doi.org/10.1007/s00521-016-2819-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-016-2819-1