Abstract
Interpretation of microarray data is often crucial in various aspects of computational analysis. Dozens of techniques like clustering, subspace-clustering are often applied to these datasets to pluck-out significant results that provide solutions to drug discovery, disease identification like practical healthcare problems. Clustering techniques are used to group the genes exhibiting similar behavior under particular conditions. This classical method fails while the grouping of genes is done according to a subset of conditions as it performs globally. Biclustering addresses successfully this issue having constraints for evaluation of gene grouping only under a subset of the conditions. However, one of the limitations of the biclustering technique is, it is not capable to analyze the longitudinal experiments which consider different time points for the analysis of the gene expression profiles under a subset of conditions. This affair motivates to adopt triclustering on gene expression microarray data. Triclustering usually finds a set of genes of similar behaviors under a subset of conditions under certain time points. In this research article, two new frameworks based on different versions of parallel genetic algorithms are proposed to detect significant triclusters in gene expression profiles. In the first framework, the proposed algorithm is based on Coarse Grained parallel genetic approach and in the second framework the proposed algorithm is based on the Dynamic Deme parallel genetic approach. Both of them consider the experimental conditions and along with the time points with an advantage of paralleling the process and reducing the computational time. The proposed frameworks are tested on a standard yeast cell cycle(Saccharomyces cerevisiae) dataset and on its different synthetic versions which are widely used in the gene expression analysis. The performance analysis is done with respect to the aspects like the convergence speed of both the algorithms for different input sizes and with respect to the computation time. The statistical analysis is performed followed by the biological relevance of the simulation results is established with their functional annotations derived from the Gene Ontology and KEGG pathway analysis graph. The performance of the proposed frameworks demonstrate its effectiveness with the other state-of-the-art schemes. Experimental results reveal that the proposed architectures are efficient as they consume less computational time due to their inherent parallel behavior. Finally, the suggested architectures are considered as reliable frameworks and can be preferable over the traditional genetic approaches to analyze the gene expression microarray data from the triclustering prospective.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Araújo RB, Ferreira GHT, Orair GH, Meira W, Ferreira RAC, Neto DOG, Zaki MJ (2008) The partricluster algorithm for gene expression analysis. Int J Parallel Prog 36(2):226–249
Banka H, Mitra S (2006) Evolutionary biclustering of gene expressions. Ubiquity 2006(October):5
Bar-Joseph Z (2004) Analyzing time series gene expression data. Bioinformatics 20(16):2493–2503
Belding TC (1995) The distributed genetic algorithm revisited. arXiv preprint arXiv:adap-org/9504007
Ben-Dor A, Chor B, Karp R, Yakhini Z (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J Comput Biol 10(3–4):373–384
Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E (2012) \(\delta\)-trimax: extracting triclusters and analysing coregulation in time series gene expression data. In: International workshop on algorithms in bioinformatics, Springer, Berlin, pp 165–177
Bhar A, Haubrock M, Mukhopadhyay A, Maulik U, Bandyopadhyay S, Wingender E (2013) Coexpression and coregulation analysis of time-series gene expression data in estrogen-induced breast cancer cell. Algorithms Mol Biol 8(1):9
Bhar A, Haubrock M, Mukhopadhyay A, Wingender E (2015) Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes. BMC Bioinform 16(1):200
Bianchini R, Brown CM (1993) Parallel genetic algorithms on distributed-memory architectures. In: Transputer Research and Applications, pp 67–67
Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G (2004) Go: Termfinder–open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20(18):3710–3715
Brown PO, Botstein D (1999) Exploring the new world of the genome with dna microarrays. Nat Genet 21(1s):33
Cantú-Paz E (1998) A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux et Syst Repartis 10(2):141–171
Cheng Y, Church GM (2000) Biclustering of expression data. Ismb 8:93–103
Consortium GO (2004) The gene ontology (go) database and informatics resource. Nucleic Acids Res 32(\(suppl_{1}\)): 258–261
Cornforth TW, Lipson H (2015) A hybrid evolutionary algorithm for the symbolic modeling of multiple-time-scale dynamical systems. Evol Intel 8(4):149–164
D’haeseleer P, Liang S, Somogyi R (2000) Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16(8):707–726
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Getz G, Levine E, Domany E (2000) Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci 97(22):12079–12084
Gieseke F, Kramer O, Airola A, Pahikkala T (2012) Efficient recurrent local search strategies for semi-and unsupervised regularized least-squares classification. Evol Intell 5(3):189–205
Gnatyshak D, Ignatov DI, Kuznetsov SO, Nourine L (2014) A one-pass triclustering approach: is there any room for big data? In: CLA, pp 231–242
Goldberg DE, Milman K, Tidd C (1992) Genetic algorithms: a bibliography. IlliGAL Report 92008
Gómez-Vela F, Martínez-Álvarez F, Barranco CD, Díaz-Díaz N, Rodríguez-Baena DS, Aguilar-Ruiz JS (2011) Pattern recognition in biological time series. In: Conference of the Spanish association for artificial intelligence, Springer, Berlin, pp 164–172
Guigourès R, Boullé M, Rossi F (2015) Discovering patterns in time-varying graphs: a triclustering approach. Advances in Data Analysis and Classification pp 1–28
Gutierrez-Aviles D, Rubio-Escudero C (2014) LSL: a new measure to evaluate triclusters. In: Bioinformatics and biomedicine (BIBM), 2014 IEEE international conference on, IEEE, pp 30–37
Gutiérrez-Avilés D (2014) Rubio-Escudero C (2014) Mining 3d patterns from gene expression temporal data: a new tricluster evaluation measure. The Scientific World Journal
Gutiérrez-Avilés D, Rubio-Escudero C (2015) MSL: a measure to evaluate three-dimensional patterns in gene expression data. Evol Bioinform 11:EBO–S25822
Gutiérrez-Avilés D, Rubio-Escudero C (2016) Triq: a comprehensive evaluation measure for triclustering algorithms. In: International conference on hybrid artificial intelligence systems. Springer, Berlin, pp 673–684
Gutiérrez-Avilés D, Rubio-Escudero C, Martínez-Álvarez F, Riquelme JC (2014) Trigen: a genetic algorithm to mine triclusters in temporal gene expression data. Neurocomputing 132:42–53
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Henriques R, Madeira SC (2018) Triclustering algorithms for three-dimensional data analysis: a comprehensive survey. ACM Comput Surv 51(5):95
Holland J, Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Massachusetts
Hu Z, Bhatnagar R (2010) Algorithm for discovering low-variance 3-clusters from real-valued datasets. In: Data mining (ICDM), 2010 IEEE 10th international conference on, IEEE, pp 236–245
Jiang H, Zhou S, Guan J, Zheng Y (2006) gTRICLUSTER: a more general and effective 3D clustering algorithm for gene-sample-time microarray data. In: International workshop on data mining for biomedical applications. Springer, Berlin, pp 48–59
Kakati T, Ahmed HA, Bhattacharyya DK, Kalita JK (2016) A fast gene expression analysis using parallel biclustering and distributed triclustering approach. In: Proceedings of the second international conference on information and communication technology for competitive strategies, ACM, p 122
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2009) Kegg for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38(\({\rm suppl}\_1\)):D355–D360
Laishram A, Vipsita S (2015) Bi-clustering of gene expression microarray using coarse grained parallel genetic algorithm (cgpga) with migration. In: India conference (INDICON), 2015 annual IEEE, IEEE, pp 1–6
Li A, Tuck D (2009) An effective tri-clustering algorithm combining expression data with gene regulation information. Gene Regul Syst Biol 3:GRSB-S1150
Lin SC, Punch WF, Goodman ED (1994) Coarse-grain parallel genetic algorithms: categorization and new approach. In: Parallel and distributed processing, 1994. Proceedings. Sixth IEEE symposium on, IEEE, pp 28–37
Liu J, Li Z, Hu X, Chen Y (2008) Multi-objective evolutionary algorithm for mining 3D clusters in gene-sample-time microarray data. In: Granular computing, 2008. GrC 2008. IEEE international conference on, IEEE, pp 442–447
Liu YC, Lee CH, Chen WC, Shin J, Hsu HH, Tseng VS (2010) A novel method for mining temporally dependent association rules in three-dimensional microarray datasets. In: Computer symposium (ICS), 2010 international, IEEE, pp 759–764
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1(1):24–45
Mahanta P, Ahmed H, Bhattacharyya D, Kalita JK (2011) Triclustering in gene expression data analysis: a selected survey. In: Emerging trends and applications in computer science (NCETACS), 2011 2nd national conference on, IEEE, pp 1–6
Martínez-Ballesteros M, Martínez-Álvarez F, Troncoso A, Riquelme JC (2011) An evolutionary algorithm to discover quantitative association rules in multidimensional time series. Soft Comput 15(10):2065
Maruyama T (1993) A finegrained parallel genetic algorithm for distributed parallel system. In: Proceedings of 5th international conference on genetic algorithms, Morgan Kaufmann, pp 184–190
Mishra S, Vipsita S (2017) Biclustering of gene expression microarray data using dynamic deme parallelized genetic algorithm (DdPGA). In: Computational intelligence in bioinformatics and computational biology (CIBCB), 2017 IEEE conference on, IEEE, pp 1–8
Mitra S, Banka H (2006) Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn 39(12):2464–2477
Narmadha N, Rathipriya R (2016) Triclustering: an evolution of clustering. In: Green engineering and technologies (IC-GET), 2016 Online international conference on, IEEE, pp 1–4
Nowostawski M (1998) Parallel genetic algorithms in geometry atomic cluster optimisation and other applications. Ph.D. thesis, MS thesis, School of Computer Science, The University of Birmingham, UK
Nowostawski M, Poli R (1999) Parallel genetic algorithm taxonomy. In: Knowledge-based intelligent information engineering systems, 1999. Third international conference, IEEE, pp 88–92
Pettey CB, Leuze MR, Grefenstette JJ (1987) Parallel genetic algorithm. In: Genetic algorithms and their applications: proceedings of the second international conference on genetic algorithms: July 28–31, 1987 at the Massachusetts Institute of Technology. L. Erlhaum Associates, Cambridge
Rubio-Escudero C, Zwir I, et al (2008) Classification of gene expression profiles: comparison of k-means and expectation maximization algorithms. In: Eighth international conference on hybrid intelligent systems, IEEE, pp 831–836
Sim K, Aung Z, Gopalkrishnan V (2010) Discovering correlated subspace clusters in 3D continuous-valued data. In: Data mining (ICDM), 2010 IEEE 10th international conference on, IEEE, pp 471–480
Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–3297
Swathypriyadharsini P, Premalatha K (2016) Triocuckoo: a multi objective cuckoo search algorithm for triclustering microarray gene expression data
Tan MP, Smith EN, Broach JR, Floudas CA (2008) Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Bioinform 9(1):268
Tanay A, Sharan R, Shamir R (2002) Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(\({\rm suppl}\_1\)):S136–S144
Tchagang AB, Phan S, Famili F, Shearer H, Fobert P, Huang Y, Zou J, Huang D, Cutler A, Liu Z et al (2012) Mining biological information from 3D short time-series gene expression data: the optricluster algorithm. BMC Bioinform 13(1):54
Tibshirani R, Hastie T, Eisen M, Ross D, Botstein D, Brown P et al (1999) Clustering methods for the analysis of DNA microarray data. Stanford University, Stanford, CA, Tech Rep, Dept Statist
Vahdat A, Heywood MI (2014) On evolutionary subspace clustering with symbiosis. Evol Intell 6(4):229–256
Wang G, Yin L, Zhao Y, Mao K (2010) Efficiently mining time-delayed gene expression patterns. IEEE Trans Syst Man Cybern Part B 40(2):400–411
Xu X, Lu Y, Tan KL, Tung AK (2009) Finding time-lagged 3D clusters. In: Data engineering, 2009. ICDE’09. IEEE 25th international conference on, IEEE, pp 445–456
Yin Y, Zhao Y, Zhang B, Wang G (2007) Mining time-shifting co-regulation patterns from gene expression data. In: Advances in data and web management, Springer, Berlin, pp 62–73
Zhao L, Zaki MJ (2005) Tricluster: an effective algorithm for mining coherent clusters in 3D microarray data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, ACM, pp 694–705
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Biswal, B.S., Mohapatra, A. & Vipsita, S. Triclustering of gene expression microarray data using coarse grained and dynamic deme based parallel genetic approach. Evol. Intel. 13, 475–495 (2020). https://doi.org/10.1007/s12065-019-00330-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00330-6