Skip to main content

Semi Supervised Spectral Clustering for Regulatory Module Discovery

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5109))

Abstract

We propose a novel semi-supervised clustering method for the task of gene regulatory module discovery. The technique uses data on dna binding as prior knowledge to guide the process of spectral clustering of microarray experiments. The microarray data from a set of repeat experiments are converted to an affinity, or similarity, matrix using a Gaussian function. We have investigated two methods to determine the optimal Gaussian variance for this purpose. The first method was based on a statistical measure of cluster coherence, and the second on optimising the number of constraints satisfied in the clustering process. The constraints, which were derived from dna-binding data, were used to adjust the affinity matrix to include known gene-gene interactions. Clusters were found using a spectrical clustering algorithm, and validated by using a biological significance score which was the proportion of gene pairs sharing a common transcription factor in the resulting clusters. Our results indicate that our technique can successfully leverage the information available in the dna-binding data. To the best of our knowledge this is a novel formulation for the purpose of gene module discovery.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., Friedman, N.: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34(2), 166–176 (2003)

    Article  Google Scholar 

  2. Bar-Joseph, Z., Gerber, G.K., Lee, T.I., Rinaldi, N.J., Yoo, J.Y., Robert, F., Gordon, D.B., Fraenkel, E., Jaakkola, T.S., Young, R.A., Gifford, D.K.: Computational discovery of gene modules and regulatory networks. Nature Biotechnology 21(11), 1337–1342 (2003)

    Article  Google Scholar 

  3. Tanay, A., Sharan, R., Kupiec, M., Shamir, R.: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. PNAS 101(9), 2981–2986 (2004)

    Article  Google Scholar 

  4. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2006)

    Google Scholar 

  5. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering

    Google Scholar 

  6. Vert, J.-P., Thurman, R., Noble, W.S.: Kernels for gene regulatory regions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, vol. 18, pp. 1401–1408. MIT Press, Cambridge (2006)

    Google Scholar 

  7. Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: ICML, pp. 315–322 (2002)

    Google Scholar 

  8. Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18 (suppl. 1) (2002)

    Google Scholar 

  9. Mewes, H.W., Amid, C., Arnold, R., Frishman, D., Gueldener, U., Mannhaupt, G., Muensterkoetter, M., Pagel, P., Strack, N., Stuempflen, V., Warfsmann, J., Ruepp, A.: Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32 Database issue (January 2004)

    Google Scholar 

  10. Huang, D., Pan, W.: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data. Bioinformatics 22(10), 1259–1268 (2006)

    Article  Google Scholar 

  11. Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM J. Res. Dev 17(5), 420–425 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  12. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)

    Article  Google Scholar 

  13. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS, pp. 849–856 (2001)

    Google Scholar 

  14. Speer, N., Frlich, H., Spieth, C., Zell, A.: Functional grouping of genes using spectral clustering and gene ontology. In: Proceedings of the IEEE International Joint Conference on Neural Networks, pp. 298–303. IEEE Computer Society Press, Los Alamitos (2005)

    Chapter  Google Scholar 

  15. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11(12), 4241–4257 (2000)

    Google Scholar 

  16. Harbison, C.T., Gordon, B.D., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, A.P., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E., Young, R.A.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)

    Article  Google Scholar 

  17. Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nature Genet. 31, 370–377 (2002)

    Google Scholar 

  18. Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)

    Article  MathSciNet  Google Scholar 

  19. Gibbons, F.D., Roth, F.P.: Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation. Genome Res. 12(10), 1574–1581 (2002)

    Article  Google Scholar 

  20. Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)

    Article  Google Scholar 

  21. Teixeira, M.C., Monteiro, P., Jain, P., Tenreiro, S., Fernandes, A.R., Mira, N.P., Alenquer, M., Freitas, A.T., Oliveira, A.L., Sa-Correia, I.: The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucl. Acids Res. 34(1), 446–451 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Amos Bairoch Sarah Cohen-Boulakia Christine Froidevaux

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mishra, A., Gillies, D. (2008). Semi Supervised Spectral Clustering for Regulatory Module Discovery. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69828-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69827-2

  • Online ISBN: 978-3-540-69828-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics