Summary
Microarray technology is used for studying gene regulation at the genome and transcriptome level. In the most common application, the expression level of thousands of genes is monitored simultaneously leading to a huge dataset having high dimensionality. It is assumed that genes with similar function or regulatory elements will display a common expression profile over a variety of biological conditions. For some cases, it may be desirable to study simultaneously many drugs in different experimental conditions (e.g. concentration or time point) on biological models, leading to the generation of 3-way data. Cluster analysis is used for identifying biologically relevant groups of genes. In this chapter, fuzzy cluster analysis is used for this purpose. After a brief formulation of the problem, we outline motivations for our choice of the clustering algorithm. Then, the fuzzy clustering algorithms are presented and the main tuning parameters are discussed in the context of 2-way and 3-way microarray data. We propose a transformation allowing more contrast in distances between all pairs of samples in a dataset. This increases the likelihood of detection of a group structure, if any, in a high dimensional dataset. Results showing the performance of the fuzzy C-Means algorithm are carried out using real datasets. These results are finally validated through functional enrichment of genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abou-Sleymane, G., Chalmel, F., Helmlinger, D., Lardenois, A., Thibault, C., Weber, C., Mérienne, K., Mandel, J.-L., Poch, O., Devys, D., Trottier, Y.: Polyglutamine expansion causes neurodegeneration by altering the neuronal differentiation program. Hum. Mol. Genetics 15(5), 691–703 (2006)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30, 41–47 (2002)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. of Comput. Biol. 6(3-4), 281–297 (1999)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York (1981)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19(2), 185–193 (2003)
Broberg, P.: Statistical methods for ranking differentially expressed genes. Genome Biology 4(6), R 41.1–R 41.9 (2003)
Cromer, A., Carles, A., Millon, R., Ganguli, G., Chalmel, F., Lemaire, F., Young, J., Dembele, D., Thibault, C., le Muller, D., Poch, O., Abecassis, J., Wasylyk, B.: Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis. Oncogene 23, 2484–2498 (2004)
Dembele, D., Kastner, P.: Fuzzy c-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)
Dennis Jr., G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., Lempicki, R.A.: DAVID: Database for annotation, visualization, and integrated discovery. Genome Biology 4(9), R60 (2003)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. 95, 14863–14868 (1998)
Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Analysis and Machine Intelligence 11(7), 773–781 (1989)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Gröll, L., Jäkel, J.: A new convergence proof of fuzzy c-means. IEEE Trans. Fuzzy Systems 13(5), 717–720 (2005)
Gustafson, E.E., Kessel, W.C.: Fuzzy clustering with a fuzzy covariance matrix. In: Proc. of the IEEE Conference, vol. 2, pp. 761–766 (1978)
Höppner, F., Klawonn, F.: A contribution to convergence theory of fuzzy c-means and derivatives. IEEE Trans. Fuzzy Syst. 11(5), 682–694 (2003)
Affymetrix cgos, free technical support software, http://www.affymetrix.com/support/index.affx
Irizarry, R.A., Bolstad, B.M., Collin, F., Cope, L.M., Hobbs, B., Speed, T.P.: Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31(4), e15 (2003)
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C.F., Trent, J.M., Staudt, L.M., Hudson Jr., J., Bogoski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)
Krishnapuram, R., Kim, J.: Clustering algorithms based on volume criteria. IEEE Trans. Fuzzy Systems 8(2), 228–236 (2000)
Milligan, G.M., Cooper, M.C.: An Examination of Procedures for Determining the Number of Clusters in a Data Set. Psychometrika 50(2), 159–179 (1985)
Morgan, B.J.T., Ray, A.P.G.: Non-uniqueness and inversions in cluster analysis. Appl. Statist. 44, 117–134 (1995)
Sato, M., Sato, Y., Jain, L.C.: Fuzzy clustering models and applications. Physica-Verlag (1997)
Sharan, R., Shamir, R.: A clustering algorithm with application to gene expression analysis. In: Proc. AAAI - ISMB, CLICK 2000, pp. 307–316 (2000)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Merisov, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005)
Tamayo, P., Slonim, D., Mesirov, J.P., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912 (1999)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Raymond, I., Cho, R.I., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Theodorisis, S., Kouthroumbas, K.: Pattern recognition. Academic Press, New-York (1999)
Tian, L., Greenberg, S.A., Kong, S.W., Altschuler, J., Kohane, I.S., Park, P.J.: Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. USA 102(38), 13544–13549 (2005)
Tseng, G.C., Oh, M.-K., Rohlin, L., Liao, J.C., Wong, W.H.: Issues in cdna microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic. Acids Res. 29(12), 2549–2557 (2001)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98(9), 5116–5121 (2001)
Wicker, N., Dembele, D., Raffelsberger, W., Poch, O.: Density of points clustering, application to transcriptomic data analysis. Nucleic Acids Research 30(18), 3992–4000 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dembélé, D. (2009). Microarray Data Analysis Using Fuzzy Clustering Algorithms. In: Jin, Y., Wang, L. (eds) Fuzzy Systems in Bioinformatics and Computational Biology. Studies in Fuzziness and Soft Computing, vol 242. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89968-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-89968-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89967-9
Online ISBN: 978-3-540-89968-6
eBook Packages: EngineeringEngineering (R0)