Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference

Fogelberg, Christopher; Palade, Vasile

doi:10.1007/978-1-84882-983-1_10

Christopher Fogelberg^4,5 &
Vasile Palade⁶

1292 Accesses

Abstract

Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference; however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clustering algorithm carefully. In this paper we explicitly analyse the attributes that make a clustering algorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Azuaje, F.: Clustering-based approaches to discovering and visualing microarray data patterns. Brief. in Bioinformatics, 4(1):31–42, Mar. 2003.
Article Google Scholar
Balagurunathan, Y., Naisyin, W., Dougherty, E. R., Danh, N., Bittner, M. L., Trent, J. and Carroll, R.: Noise factor analysis for cDNA microarrays. J. of Biomed. Opt., 9(4):663–678, Jul./Aug. 2004.
Article Google Scholar
Barabasi, A. L. and Oltvai., Z. N.: Network biology: Understanding the cell’s functional organisation. Nat. Rev. Genet., 5(2):101–113, Feb. 2004.
Article Google Scholar
Bonneau, R., Reiss, D. J., Shannon, P., Facciotti, M., Leroy, H., Baliga, N. S. and Thorsson, V.: The inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol., 7(R36), 2006.
Google Scholar
de Jong, H.: Modeling and simulation of genetic regulatory systems: A literature review. J. of Comput. Biol., 9(1):67–103, 2002.
Article Google Scholar
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. of the National Acad. of Sci. USA, 95(25):14863–14868, Dec. 1998.
Article Google Scholar
FitzGerald, P. C., Sturgill, D., Shyakhtenko, A. and Vinson, B.: Comparative genomics of drosophila and human core promoters. Genome Biol., 7:R53+, Jul. 2006.
Article Google Scholar
Fogelberg, C. and Palade, V.: GreenSim: A genetic regulatory network simulator. Technical Report PRG-RR-08-07, Computing Laboratory, Oxford University, Oxford, OX1-3QD,May 2008.
Google Scholar
Fogelberg, C. and Palade, V.: Foundations of Computational Intelligence, chapter 1.1, Genetic Regulatory Networks: A Review and a Roadmap. Springer Verlag, 2008.
Google Scholar
Fu, L. and Medico, E.: FLAME, a novel clustering method for the analysis of microarray data. BMC Bioinformatics, 8(3), Jan. 2007.
Google Scholar
Futschik,M. E. and Carlisle, B.: Noise-robust soft clustering of gene expression time-course data. J. of Bioinformatics and Comput. Biol., 3(4):965–988, 2005.
Article Google Scholar
Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R. and Sethna., J. P.: Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol., 3(10):e189, Oct. 2007.
Article MathSciNet Google Scholar
Hayete, B., Gardner, T. S. and Collins, J. J.: Size matters: Network inference tackles the genome scale. Mol. Syst. Biol., 3(77):1–3, Feb. 2007.
Google Scholar
Horimoto, K. and Toh, H.: Statistical estimation of cluster boundaries in gene expression profile data. Bioinformatics, 17(12):1143–1151, 2001.
Article Google Scholar
Jiang, D., Tang, D. and Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Trans. on Knowl. and Data Eng., 16(11):1370–1386, 2004. ISSN 1041-4347.
Article Google Scholar
Klebanov, L. and Yakovlev, A.: How high is the level of technical noise in microarray data? Biol. Direct, 2:9+, Apr. 2007. ISSN 1745-6150.
Google Scholar
Kondadadi, R. and Kozma, R.: A modified fuzzy ART for soft document clustering. v. 3, pages 2545–2549, 2002. doi: 10.1109/IJCNN.2002.1007544.
Google Scholar
Kyoda, K. M., Morohashi, M., Onami, S. and Kitano, H.: A gene network inference method from continuous-value gene expression data of wild-type and mutants. Genome Informatics, 11:196–204, 2000.
Google Scholar
Madeira, S. C. and Oliveira, A. L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. on Comput. Biol. and Bioinformatics, 1 (1):24–45, 2004. doi: 10.1109/TCBB.2004.2.
Article Google Scholar
Nykter, M., Aho, T., Ahdesmäki, M., Ruusuvuori, P., Lehmussola, A., and Yli-Harja, O.: Simulation of microarray data with realistic characteristics. Bioinformatics, 7:349, Jul. 2006.
Google Scholar
Pritsker,M., Liu, Y., Beer,M. A. and Tavazoie, S.: Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res., 14(1):99–108, Jan. 2004. doi: 10.1101/gr.1739204.
Article Google Scholar
Reiss, D., Baliga, N. and Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics, 7(1):280, 2006. ISSN 1471-2105.
Google Scholar
Schlitt, T. and Brazma, A.: Modelling gene networks at different organisational levels. FEBS Lett., 579:1859–1866,Mar. 2005. ISSN 0014-5793.
Article Google Scholar
Schlitt, T. and Brazma, A.: Current approaches to gene regulatory network modelling. BMC Bioinformatics, 8 Suppl 6, 2007. ISSN 1471-2105.
Google Scholar
Shamir, R. and Sharan, R.: Current Topics in Computational Biology, chapter Algorithmic approaches to clustering gene expression data, pages 269–300. MIT press, Cambridge,Massachusetts, 2002. (T. Jiang, T. Smith, Y. Xu and M. Q. Zhang, eds).
Google Scholar
Tibshirani, R., Hastie, T., Eisen,M., Ross, D., Botstein, D. and Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University, Oct. 1999.
Google Scholar
Toh, H. and Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics, 18 (2):287–297, 2002.
Article Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B.: Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525, Jun. 2001. ISSN 1367-4803.
Article Google Scholar
Yang, E., Foteinou, P. T., King, K. R., Yarmush, M. L. and Androulakis, I. P.: A novel non-overlapping bi-clustering algorithm for network generation using living cell array data. Bioinformatics, 23(17):2306–2313, 2007. doi: 10.1093/bioinformatics/btm335.
Article Google Scholar
Yu, J., Smith, V. A., Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In Int. Conf. on Syst. Biol. (ICSB02), Dec. 2002.
Google Scholar
Yu, J., Smith, V. A.,Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594–3603, 2004.
Article Google Scholar
Zhou, X., Wang, X., Dougherty, E. R., Russ, D. and Suh, E.: Gene clustering based on clusterwide mutual information. J. of Comput. Biol., 11(1):147–161, 2004.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Oxford University Computing Laboratory, OX1-3QD, Oxford, UK
Christopher Fogelberg
Oxford-Man Institute, OX1-4EH, Oxford, UK
Christopher Fogelberg
Oxford University Computing Laboratory, OX1-3QD, Oxford, UK
Vasile Palade

Authors

Christopher Fogelberg
View author publications
You can also search for this author in PubMed Google Scholar
Vasile Palade
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher Fogelberg .

Editor information

Editors and Affiliations

Dept. Computer Science and, University of Portsmouth, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom
Max Bramer
Stratum Management Ltd., Southbrook Place 11, Micheldever, Hants., SO21 3DE, United Kingdom
Richard Ellis
School of Computing &, University of Greenwich, Park Row 30, London, SE10 9LS, United Kingdom
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fogelberg, C., Palade, V. (2010). Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference. In: Bramer, M., Ellis, R., Petridis, M. (eds) Research and Development in Intelligent Systems XXVI. Springer, London. https://doi.org/10.1007/978-1-84882-983-1_10

Download citation

DOI: https://doi.org/10.1007/978-1-84882-983-1_10
Published: 19 October 2009
Publisher Name: Springer, London
Print ISBN: 978-1-84882-982-4
Online ISBN: 978-1-84882-983-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics