Abstract
Many methods have been developed for reverse engineering gene networks from time series expression data. However, when the number of genes and the complexity of regulation increase, it becomes increasingly difficult to infer gene networks. To tackle this scalability problem, this study presents an approach with two phases: gene clustering and network reconstruction. To perform gene clustering, a hybrid method of data and knowledge-based clustering was developed to calculate both data and semantic similarity between genes. In the network reconstruction procedure, a Boolean network model that was inferred from the gene clusters was used to represent the network. A series of experiments were conducted to investigate the effect of the hybrid similarity measure in gene clustering and network reconstruction. The results prove the feasibility and effectiveness of the proposed approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hartemink AJ. Reverse engineering gene regulatory networks. Nat Biotechnol. 2005;3(5):554–5.
Ingolia NT, Weissman JS. Systems biology: reverse engineering the cell. Nature. 2008;454:1059–62.
Lee W-P, Tzou W-S. Computational methods for discovering gene networks from expression data. Brief Bioinform. 2009;10(4):408–23.
Voit EO. Biochemical systems theory: a review. ISRN Biomathematics. 2013;2013:897658.
Andreopoulos B, An A, Wang X, Schroeder M. A roadmap of clustering algorithms: finding amatch for a biomedical application. Brief Bioinform. 2009;10(3):297–314.
Pirim H, Ekiolu B, Perkins A. Clustering of high throughput gene expression data. Comput Oper Res. 2012;39(12):3046–61.
Alakwaa FM, Solouma NH, Kadah YM. Construction of gene regulatory networks using biclustering and Bayesian networks. Theor Biol Med Modell. 2011;8(1):39–58.
Lee W-P, Hsiao Y-T. An adaptive GA-PSO approach with gene clustering to infer S-system models of gene regulatory networks. Comput J. 2011;54(9):1449–64.
Gormley P, Li K, Wolkenhauer O, Irwin GW. Reverse engineering of biochemical reaction networks using co-evolution with eng-genes. Cogn Comput. 2013;5(1):106–18.
Picard F, Robin S, Lebarbier E, Daudin JJ. A segmentation/clustering model for the analysis of array CGH data. Biometrics. 2007;63(3):758–66.
Torshizi AD, Zarandi MHF. A new cluster validity measure based on general type-2 fuzzy sets: application in gene expression data clustering. Knowl Based Syst. 2014;64:81–93.
Tan M, Alshalalfa M, Alhajj R, Polat F. Influence of prior knowledge in constraint-based learning of gene regulatory networks. IEEE Trans Comput Biol Bioinform. 2011;8(1):130–42.
Alterovitz G, Ramoni MF. Knowledge-based bioinformatics: from analysis to interpretation. Chichester, UK: Wiley; 2010.
Lee W-P, Yang K-C. A clustering-based approach for inferring recurrent neural networks as gene regulatory networks. Neurocomputing. 2008;71(4–6):600–10.
Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003;19(10):1275–83.
Kustra R, Zagdanski A. Data-fusion in clustering microarray data: balancing discovery and interpretability. IEEE/ACM Trans Comput Biol Bioinform. 2010;7(1):50–63.
Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer; 2002.
Camon E, Magrane M, Barrell D, et al. The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with Gene Oncology. Nucleic Acids Res. 2004;32(2004):D262–6.
Mussel C, Hopfensitz M, Kestler HA. BoolNet—an R package for generation, reconstruction and analysis of Boolean networks. Bioinformatics. 2012;26(10):1378–80.
Hsiao Y-T, Lee W-P. A sensitivity-based incremental evolution approach for the inference of gene networks. BMC Bioinform. 2012;13(Suppl 2):S8.
Wang R-S, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of methodology and applications. Phys Biol. 2012;9(5):055001.
Saadatpoura A, Albert R. Boolean modeling of biological regulatory networks: a methodology tutorial. Methods. 2013;62(1):3–12.
Hernminger BM, Saelim B, Sullivan PF, Vision TJ. Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J Am Soc Inf Sci Technol. 2007;58(14):2341–52.
Praveen P, Frohlich H. Boosting probabilistic graphical model inference by incorporating prior knowledge from multiple sources. PLoS ONE. 2013;8(6):e67410.
Chen CC, Zhong S. Inferring gene regulatory networks by thermodynamic modeling. BMC Genom. 2008;9(Supplement 2):S19.
Vasic B, Ravanmehr V, Krishnan AR. An information theoretic approach to constructing robust Boolean gene regulatory networks. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(1):52–65.
Ruz GA, Goles E. Learning gene regulatory networks using the bees algorithm. Neural Comput Appl. 2013;22(1):63–70.
Upstill-Goddard R, Eccles D, Reige J, Collins A. Machine learning approaches for the discovery of gene–gene interactions in disease data. Brief Bioinform. 2013;14(2):251–60.
Ayadi W, Elloumi M, Hao JK. BiMine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowl Based Syst. 2012;35:224–34.
Masciari E, Mazzeo GM, Zaniolo C. Analysing microarray expression data through effective clustering. Inf Sci. 2014;262:32–45.
Malik ZK, Hussain A, Jonathan W. Novel biologically inspired approaches to extracting online information from temporal data. Cogn Comput. 2014;6(3):595–607.
Yeung KY, Ruzzo WL. Principal component analysis for clustering gene expression data. Bioinformatics. 2001;17(9):763–74.
Ma S, Dai Y. Principal component analysis based methods in bioinformatics studies. Brief Bioinform. 2011;12(6):714–22.
Snaider J, Franklin S. Modular composite representation. Cogn Comput. 2014;6(3):510–27.
Xu J, Yang G, Yin Y, Man H, He H. Sparse-representation-based classification with structure-preserving dimension reduction. Cogn Comput. 2014;6(3):608–21.
Bourdon J, Eveillard D, Siegel A. Integrating quantitative knowledge into a qualitative gene regulatory network. PLoS Comput Biol. 2011;7(9):e1002157.
Mazandu GK, Mulder NJ. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. BioMed Res Int. 2013;2013:292063.
Couto FM, Silva MJ, Coutinho PM. Measuring semantic similarity between Gene Ontology terms. Data Knowl Eng. 2007;61(1):137–52.
Peng J, Wang Y, Chen J. Towards integrative gene functional similarity measurement. BMC Bioinform. 2014;15(S2):S5.
Batet M, Sanchez D, Valls A. An ontology-based measure to compute semantic similarity in biomedicine. J Biomed Inform. 2011;44:118–25.
Mazandu GK, Mulder NJ. A topology-based metric for measuring term similarity in the gene ontology. Adv Bioinform. 2012;2012:975783.
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. Proceedings of international joint conference on artificial intelligence, 1995, p. 448–53.
Resnik P. Semantic similarity in a taxonomy: an information based measure and its application to problems of ambiguity in natural language. J Artif Intell Res. 1999;11:95–130.
Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23(10):1274–81.
Yu H, Jansen R, Stolovitzky G, Gerstein M. Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications. Bioinformatics. 2007;23(16):2163–73.
Bezdek J. FCM: the fuzzy c-means clustering algorithm. Comput Geosci. 1981;10(2–3):191–203.
Trauwaert E. On the meaning of Dunn’s partition coefficient for fuzzy clusters. Fuzzy Sets Syst. 1988;25(2):217–42.
Dembélé D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics. 2003;19(8):973–80.
Zainudin S, Mohamed NS. Evaluating the performance of partitioning techniques for gene network inference. Proceedings of international conference on intelligent systems design and applications, 2010, p.1119–24.
Mahdavi MA, Lin Y-H. False positive reduction in protein-protein interaction predictions using gene ontology annotations. BMC Bioinform. 2007; 8:262.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, WP., Lin, CH. Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction. Cogn Comput 8, 217–227 (2016). https://doi.org/10.1007/s12559-015-9349-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-015-9349-5