Abstract
We propose a novel semi-supervised clustering method for the task of gene regulatory module discovery. The technique uses data on dna binding as prior knowledge to guide the process of spectral clustering of microarray experiments. The microarray data from a set of repeat experiments are converted to an affinity, or similarity, matrix using a Gaussian function. We have investigated two methods to determine the optimal Gaussian variance for this purpose. The first method was based on a statistical measure of cluster coherence, and the second on optimising the number of constraints satisfied in the clustering process. The constraints, which were derived from dna-binding data, were used to adjust the affinity matrix to include known gene-gene interactions. Clusters were found using a spectrical clustering algorithm, and validated by using a biological significance score which was the proportion of gene pairs sharing a common transcription factor in the resulting clusters. Our results indicate that our technique can successfully leverage the information available in the dna-binding data. To the best of our knowledge this is a novel formulation for the purpose of gene module discovery.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., Friedman, N.: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34(2), 166–176 (2003)
Bar-Joseph, Z., Gerber, G.K., Lee, T.I., Rinaldi, N.J., Yoo, J.Y., Robert, F., Gordon, D.B., Fraenkel, E., Jaakkola, T.S., Young, R.A., Gifford, D.K.: Computational discovery of gene modules and regulatory networks. Nature Biotechnology 21(11), 1337–1342 (2003)
Tanay, A., Sharan, R., Kupiec, M., Shamir, R.: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. PNAS 101(9), 2981–2986 (2004)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning (Adaptive Computation and Machine Learning). MIT Press, Cambridge (2006)
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering
Vert, J.-P., Thurman, R., Noble, W.S.: Kernels for gene regulatory regions. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems 18, vol. 18, pp. 1401–1408. MIT Press, Cambridge (2006)
Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: ICML, pp. 315–322 (2002)
Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18 (suppl. 1) (2002)
Mewes, H.W., Amid, C., Arnold, R., Frishman, D., Gueldener, U., Mannhaupt, G., Muensterkoetter, M., Pagel, P., Strack, N., Stuempflen, V., Warfsmann, J., Ruepp, A.: Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 32 Database issue (January 2004)
Huang, D., Pan, W.: Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data. Bioinformatics 22(10), 1259–1268 (2006)
Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM J. Res. Dev 17(5), 420–425 (1973)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS, pp. 849–856 (2001)
Speer, N., Frlich, H., Spieth, C., Zell, A.: Functional grouping of genes using spectral clustering and gene ontology. In: Proceedings of the IEEE International Joint Conference on Neural Networks, pp. 298–303. IEEE Computer Society Press, Los Alamitos (2005)
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11(12), 4241–4257 (2000)
Harbison, C.T., Gordon, B.D., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, A.P., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E., Young, R.A.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., Barkai, N.: Revealing modular organization in the yeast transcriptional network. Nature Genet. 31, 370–377 (2002)
Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Gibbons, F.D., Roth, F.P.: Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation. Genome Res. 12(10), 1574–1581 (2002)
Gat-Viks, I., Sharan, R., Shamir, R.: Scoring clustering solutions by their biological relevance. Bioinformatics 19(18), 2381–2389 (2003)
Teixeira, M.C., Monteiro, P., Jain, P., Tenreiro, S., Fernandes, A.R., Mira, N.P., Alenquer, M., Freitas, A.T., Oliveira, A.L., Sa-Correia, I.: The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucl. Acids Res. 34(1), 446–451 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mishra, A., Gillies, D. (2008). Semi Supervised Spectral Clustering for Regulatory Module Discovery. In: Bairoch, A., Cohen-Boulakia, S., Froidevaux, C. (eds) Data Integration in the Life Sciences. DILS 2008. Lecture Notes in Computer Science(), vol 5109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69828-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-69828-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69827-2
Online ISBN: 978-3-540-69828-9
eBook Packages: Computer ScienceComputer Science (R0)