Abstract
Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference; however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clustering algorithm carefully. In this paper we explicitly analyse the attributes that make a clustering algorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azuaje, F.: Clustering-based approaches to discovering and visualing microarray data patterns. Brief. in Bioinformatics, 4(1):31–42, Mar. 2003.
Balagurunathan, Y., Naisyin, W., Dougherty, E. R., Danh, N., Bittner, M. L., Trent, J. and Carroll, R.: Noise factor analysis for cDNA microarrays. J. of Biomed. Opt., 9(4):663–678, Jul./Aug. 2004.
Barabasi, A. L. and Oltvai., Z. N.: Network biology: Understanding the cell’s functional organisation. Nat. Rev. Genet., 5(2):101–113, Feb. 2004.
Bonneau, R., Reiss, D. J., Shannon, P., Facciotti, M., Leroy, H., Baliga, N. S. and Thorsson, V.: The inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol., 7(R36), 2006.
de Jong, H.: Modeling and simulation of genetic regulatory systems: A literature review. J. of Comput. Biol., 9(1):67–103, 2002.
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. of the National Acad. of Sci. USA, 95(25):14863–14868, Dec. 1998.
FitzGerald, P. C., Sturgill, D., Shyakhtenko, A. and Vinson, B.: Comparative genomics of drosophila and human core promoters. Genome Biol., 7:R53+, Jul. 2006.
Fogelberg, C. and Palade, V.: GreenSim: A genetic regulatory network simulator. Technical Report PRG-RR-08-07, Computing Laboratory, Oxford University, Oxford, OX1-3QD,May 2008.
Fogelberg, C. and Palade, V.: Foundations of Computational Intelligence, chapter 1.1, Genetic Regulatory Networks: A Review and a Roadmap. Springer Verlag, 2008.
Fu, L. and Medico, E.: FLAME, a novel clustering method for the analysis of microarray data. BMC Bioinformatics, 8(3), Jan. 2007.
Futschik,M. E. and Carlisle, B.: Noise-robust soft clustering of gene expression time-course data. J. of Bioinformatics and Comput. Biol., 3(4):965–988, 2005.
Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R. and Sethna., J. P.: Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol., 3(10):e189, Oct. 2007.
Hayete, B., Gardner, T. S. and Collins, J. J.: Size matters: Network inference tackles the genome scale. Mol. Syst. Biol., 3(77):1–3, Feb. 2007.
Horimoto, K. and Toh, H.: Statistical estimation of cluster boundaries in gene expression profile data. Bioinformatics, 17(12):1143–1151, 2001.
Jiang, D., Tang, D. and Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Trans. on Knowl. and Data Eng., 16(11):1370–1386, 2004. ISSN 1041-4347.
Klebanov, L. and Yakovlev, A.: How high is the level of technical noise in microarray data? Biol. Direct, 2:9+, Apr. 2007. ISSN 1745-6150.
Kondadadi, R. and Kozma, R.: A modified fuzzy ART for soft document clustering. v. 3, pages 2545–2549, 2002. doi: 10.1109/IJCNN.2002.1007544.
Kyoda, K. M., Morohashi, M., Onami, S. and Kitano, H.: A gene network inference method from continuous-value gene expression data of wild-type and mutants. Genome Informatics, 11:196–204, 2000.
Madeira, S. C. and Oliveira, A. L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. on Comput. Biol. and Bioinformatics, 1 (1):24–45, 2004. doi: 10.1109/TCBB.2004.2.
Nykter, M., Aho, T., Ahdesmäki, M., Ruusuvuori, P., Lehmussola, A., and Yli-Harja, O.: Simulation of microarray data with realistic characteristics. Bioinformatics, 7:349, Jul. 2006.
Pritsker,M., Liu, Y., Beer,M. A. and Tavazoie, S.: Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res., 14(1):99–108, Jan. 2004. doi: 10.1101/gr.1739204.
Reiss, D., Baliga, N. and Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics, 7(1):280, 2006. ISSN 1471-2105.
Schlitt, T. and Brazma, A.: Modelling gene networks at different organisational levels. FEBS Lett., 579:1859–1866,Mar. 2005. ISSN 0014-5793.
Schlitt, T. and Brazma, A.: Current approaches to gene regulatory network modelling. BMC Bioinformatics, 8 Suppl 6, 2007. ISSN 1471-2105.
Shamir, R. and Sharan, R.: Current Topics in Computational Biology, chapter Algorithmic approaches to clustering gene expression data, pages 269–300. MIT press, Cambridge,Massachusetts, 2002. (T. Jiang, T. Smith, Y. Xu and M. Q. Zhang, eds).
Tibshirani, R., Hastie, T., Eisen,M., Ross, D., Botstein, D. and Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University, Oct. 1999.
Toh, H. and Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics, 18 (2):287–297, 2002.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B.: Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525, Jun. 2001. ISSN 1367-4803.
Yang, E., Foteinou, P. T., King, K. R., Yarmush, M. L. and Androulakis, I. P.: A novel non-overlapping bi-clustering algorithm for network generation using living cell array data. Bioinformatics, 23(17):2306–2313, 2007. doi: 10.1093/bioinformatics/btm335.
Yu, J., Smith, V. A., Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In Int. Conf. on Syst. Biol. (ICSB02), Dec. 2002.
Yu, J., Smith, V. A.,Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594–3603, 2004.
Zhou, X., Wang, X., Dougherty, E. R., Russ, D. and Suh, E.: Gene clustering based on clusterwide mutual information. J. of Comput. Biol., 11(1):147–161, 2004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London
About this paper
Cite this paper
Fogelberg, C., Palade, V. (2010). Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference. In: Bramer, M., Ellis, R., Petridis, M. (eds) Research and Development in Intelligent Systems XXVI. Springer, London. https://doi.org/10.1007/978-1-84882-983-1_10
Download citation
DOI: https://doi.org/10.1007/978-1-84882-983-1_10
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84882-982-4
Online ISBN: 978-1-84882-983-1
eBook Packages: Computer ScienceComputer Science (R0)