Skip to main content

Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXVI

Abstract

Modern biological research increasingly recognises the importance of genome-wide gene regulatory network inference; however, a range of statistical, technological and biological factors make it a difficult and intractable problem. One approach that some research has used is to cluster the data and then infer a structural model of the clusters. When using this kind of approach it is very important to choose the clustering algorithm carefully. In this paper we explicitly analyse the attributes that make a clustering algorithm appropriate, and we also consider how to measure the quality of the identified clusters. Our analysis leads us to develop three novel cluster quality measures that are based on regulatory overlap. Using these measures we evaluate two modern candidate algorithms: FLAME, and KMART. Although FLAME was specifically developed for clustering gene expression profile data, we find that KMART is probably a better algorithm to use if the goal is to infer a structural model of the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Azuaje, F.: Clustering-based approaches to discovering and visualing microarray data patterns. Brief. in Bioinformatics, 4(1):31–42, Mar. 2003.

    Article  Google Scholar 

  2. Balagurunathan, Y., Naisyin, W., Dougherty, E. R., Danh, N., Bittner, M. L., Trent, J. and Carroll, R.: Noise factor analysis for cDNA microarrays. J. of Biomed. Opt., 9(4):663–678, Jul./Aug. 2004.

    Article  Google Scholar 

  3. Barabasi, A. L. and Oltvai., Z. N.: Network biology: Understanding the cell’s functional organisation. Nat. Rev. Genet., 5(2):101–113, Feb. 2004.

    Article  Google Scholar 

  4. Bonneau, R., Reiss, D. J., Shannon, P., Facciotti, M., Leroy, H., Baliga, N. S. and Thorsson, V.: The inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol., 7(R36), 2006.

    Google Scholar 

  5. de Jong, H.: Modeling and simulation of genetic regulatory systems: A literature review. J. of Comput. Biol., 9(1):67–103, 2002.

    Article  Google Scholar 

  6. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. of the National Acad. of Sci. USA, 95(25):14863–14868, Dec. 1998.

    Article  Google Scholar 

  7. FitzGerald, P. C., Sturgill, D., Shyakhtenko, A. and Vinson, B.: Comparative genomics of drosophila and human core promoters. Genome Biol., 7:R53+, Jul. 2006.

    Article  Google Scholar 

  8. Fogelberg, C. and Palade, V.: GreenSim: A genetic regulatory network simulator. Technical Report PRG-RR-08-07, Computing Laboratory, Oxford University, Oxford, OX1-3QD,May 2008.

    Google Scholar 

  9. Fogelberg, C. and Palade, V.: Foundations of Computational Intelligence, chapter 1.1, Genetic Regulatory Networks: A Review and a Roadmap. Springer Verlag, 2008.

    Google Scholar 

  10. Fu, L. and Medico, E.: FLAME, a novel clustering method for the analysis of microarray data. BMC Bioinformatics, 8(3), Jan. 2007.

    Google Scholar 

  11. Futschik,M. E. and Carlisle, B.: Noise-robust soft clustering of gene expression time-course data. J. of Bioinformatics and Comput. Biol., 3(4):965–988, 2005.

    Article  Google Scholar 

  12. Gutenkunst, R. N., Waterfall, J. J., Casey, F. P., Brown, K. S., Myers, C. R. and Sethna., J. P.: Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol., 3(10):e189, Oct. 2007.

    Article  MathSciNet  Google Scholar 

  13. Hayete, B., Gardner, T. S. and Collins, J. J.: Size matters: Network inference tackles the genome scale. Mol. Syst. Biol., 3(77):1–3, Feb. 2007.

    Google Scholar 

  14. Horimoto, K. and Toh, H.: Statistical estimation of cluster boundaries in gene expression profile data. Bioinformatics, 17(12):1143–1151, 2001.

    Article  Google Scholar 

  15. Jiang, D., Tang, D. and Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Trans. on Knowl. and Data Eng., 16(11):1370–1386, 2004. ISSN 1041-4347.

    Article  Google Scholar 

  16. Klebanov, L. and Yakovlev, A.: How high is the level of technical noise in microarray data? Biol. Direct, 2:9+, Apr. 2007. ISSN 1745-6150.

    Google Scholar 

  17. Kondadadi, R. and Kozma, R.: A modified fuzzy ART for soft document clustering. v. 3, pages 2545–2549, 2002. doi: 10.1109/IJCNN.2002.1007544.

    Google Scholar 

  18. Kyoda, K. M., Morohashi, M., Onami, S. and Kitano, H.: A gene network inference method from continuous-value gene expression data of wild-type and mutants. Genome Informatics, 11:196–204, 2000.

    Google Scholar 

  19. Madeira, S. C. and Oliveira, A. L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. on Comput. Biol. and Bioinformatics, 1 (1):24–45, 2004. doi: 10.1109/TCBB.2004.2.

    Article  Google Scholar 

  20. Nykter, M., Aho, T., Ahdesmäki, M., Ruusuvuori, P., Lehmussola, A., and Yli-Harja, O.: Simulation of microarray data with realistic characteristics. Bioinformatics, 7:349, Jul. 2006.

    Google Scholar 

  21. Pritsker,M., Liu, Y., Beer,M. A. and Tavazoie, S.: Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Res., 14(1):99–108, Jan. 2004. doi: 10.1101/gr.1739204.

    Article  Google Scholar 

  22. Reiss, D., Baliga, N. and Bonneau, R.: Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics, 7(1):280, 2006. ISSN 1471-2105.

    Google Scholar 

  23. Schlitt, T. and Brazma, A.: Modelling gene networks at different organisational levels. FEBS Lett., 579:1859–1866,Mar. 2005. ISSN 0014-5793.

    Article  Google Scholar 

  24. Schlitt, T. and Brazma, A.: Current approaches to gene regulatory network modelling. BMC Bioinformatics, 8 Suppl 6, 2007. ISSN 1471-2105.

    Google Scholar 

  25. Shamir, R. and Sharan, R.: Current Topics in Computational Biology, chapter Algorithmic approaches to clustering gene expression data, pages 269–300. MIT press, Cambridge,Massachusetts, 2002. (T. Jiang, T. Smith, Y. Xu and M. Q. Zhang, eds).

    Google Scholar 

  26. Tibshirani, R., Hastie, T., Eisen,M., Ross, D., Botstein, D. and Brown, P.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University, Oct. 1999.

    Google Scholar 

  27. Toh, H. and Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics, 18 (2):287–297, 2002.

    Article  Google Scholar 

  28. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R. B.: Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525, Jun. 2001. ISSN 1367-4803.

    Article  Google Scholar 

  29. Yang, E., Foteinou, P. T., King, K. R., Yarmush, M. L. and Androulakis, I. P.: A novel non-overlapping bi-clustering algorithm for network generation using living cell array data. Bioinformatics, 23(17):2306–2313, 2007. doi: 10.1093/bioinformatics/btm335.

    Article  Google Scholar 

  30. Yu, J., Smith, V. A., Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In Int. Conf. on Syst. Biol. (ICSB02), Dec. 2002.

    Google Scholar 

  31. Yu, J., Smith, V. A.,Wang, P. P., Hartemink, A. J. and Jarvis, E. D.: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics, 20(18):3594–3603, 2004.

    Article  Google Scholar 

  32. Zhou, X., Wang, X., Dougherty, E. R., Russ, D. and Suh, E.: Gene clustering based on clusterwide mutual information. J. of Comput. Biol., 11(1):147–161, 2004.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher Fogelberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London

About this paper

Cite this paper

Fogelberg, C., Palade, V. (2010). Evaluating Clustering Algorithms for Genetic Regulatory Network Structural Inference. In: Bramer, M., Ellis, R., Petridis, M. (eds) Research and Development in Intelligent Systems XXVI. Springer, London. https://doi.org/10.1007/978-1-84882-983-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-983-1_10

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-982-4

  • Online ISBN: 978-1-84882-983-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics