Abstract
Analysis of gene expression data generated by microarray techniques often includes clustering. Although more reliable methods are available, hierarchical algorithms are still frequently employed. We clustered several data sets and quantitatively compared the performance of an agglomerative hierarchical approach using the average-linkage method with two partitioning procedures, k-means and fuzzy c-means. Investigation of the results revealed the superiority of the partitioning algorithms: the compactness of the clusters was markedly increased and the arrangement of the profiles into clusters more closely resembled biological categories. Therefore, we encourage analysts to critically scrutinize the results obtained by clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Butte, A.: The use and analysis of microarray data. Nat. Rev. Drug Discov. 1, 951–960 (2002)
Shannon, W., Culverhouse, R., Duncan, J.: Analyzing microarray data using cluster analysis. Pharmacogenomics 4, 41–52 (2003)
Bullinger, L., Dohner, K., Bair, E., Frohling, S., Schlenk, R.F., Tibshirani, R., Dohner, H., Pollack, J.R.: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N. Engl. J. Med. 350, 1605–1616 (2004)
Murata, Y., Watanabe, T., Sato, M., Momose, Y., Nakahara, T., Oka, S., Iwahashi, H.: Dimethyl sulfoxide exposure facilitates phospholipid biosynthesis and cellular membrane proliferation in yeast cells. J. Biol. Chem. 278, 33185–33193 (2003)
Shimoji, T., Kanda, H., Kitagawa, T., Kadota, K., Asai, R., Takahashi, K., Kawaguchi, N., Matsumoto, S., Hayashizaki, Y., Okazaki, Y., Shinomiya, K.: Clinico-molecular study of dedifferentiation in well-differentiated liposarcoma. Biochem. Biophys. Res. Commun. 314, 1133–1140 (2004)
Amatschek, S., Koenig, U., Auer, H., Steinlein, P., Pacher, M., Gruenfelder, A., Dekan, G., Vogl, S., Kubista, E., Heider, K.H., Stratowa, C., Schreiber, M., Sommergruber, W.: Tissue-wide expression profiling using cdna subtraction and microarrays to identify tumor-specific genes. Cancer Res. 64, 844–856 (2004)
Kawahara, N., Wang, Y., Mukasa, A., Furuya, K., Shimizu, T., Hamakubo, T., Aburatani, H., Kodama, T., Kirino, T.: Genome-wide gene expression analysis for induced ischemic tolerance and delayed neuronal death following transient global ischemia in rats. J. Cereb. Blood Flow Metab. 24, 212–223 (2004)
Mirza, A., Wu, Q., Wang, L., McClanahan, T., Bishop, W.R., Gheyas, F., Ding, W., Hutchins, B., Hockenberry, T., Kirschmeier, P., Greene, J.R., Liu, S.: Global transcriptional program of p53 target genes during the process of apoptosis and cell cycle progression. Oncogene 22, 3645–3654 (2003)
Weinberg, E.O., Mirotsou, M., Gannon, J., Dzau, V.J., Lee, R.T., Pratt, R.E.: Sex dependence and temporal dependence of the left ventricular genomic response to pressure overload. Physiol. Genomics 12, 113–127 (2003)
Mar, J.C., McLachlan, G.J.: Model-based clustering in gene expression microarrays: an application to breast cancer data. In: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, pp. 139–144. Australian Computer Society, Inc. (2003)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. In: Proc. Natl. Acad. Sci., USA, vol. 96, pp. 2907–2912 (1999)
Morgan, B.J.T., Ray, A.P.G.: Non-uniqueness and inversions in cluster analysis. Appl. Statist. 44, 117–134 (1995)
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Dougherty, E.R., Barrera, J., Brun, M., Kim, S., Cesar, R.M., Chen, Y., Bittner, M., Trent, J.M.: Inference from clustering with application to gene-expression microarrays. J. Comput. Biol. 9, 105–126 (2002)
Granzow, M., Berrar, D., Dubitzky, W., Schuster, A., Azuaje, F.J., Eils, R.: Tumor classification by gene expression profiling: Comparison and validation of five clustering methods. SIGBIO Newsletter Special Interest Group on Biomedical Computing of the ACM 21, 16–22 (2001)
Harris, T.M., Childs, G.: Global gene expression patterns during differentiation of f9 embryonal carcinoma cells into parietal endoderm. Funct. Integr. Genomics 2, 105–119 (2002)
Xiao, L., Wang, K., Teng, Y., Zhang, J.: Component plane presentation integrated self-organizing map for microarray data analysis. FEBS Lett. 538, 117–124 (2003)
Wang, J., Delabie, J., Aasheim, H., Smeland, E., Myklebost, O.: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics 3, 36 (2002)
Wu, S., Chow, T.W.S.: Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recognition 37, 175–188 (2004)
Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine 40, 249–260 (1998)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, pp. 14863–14868 (1998)
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. SMC, Part B Cybernet 28, 301–315 (1998); ISSN: 1083-4419
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1999)
Möller, U., Ligges, M., Grunling, C., Georgiewa, P., Kaiser, W.A., Witte, H., Blanz, B.: Pitfalls in the clustering of neuroimage data and improvements by global optimization strategies. Neuroimage 14, 206–218 (2001)
Boldrick, J.C., Alizadeh, A.A., Diehn, M., Dudoit, S., Liu, C.L., Belcher, C.E., Botstein, D., Staudt, L.M., Brown, P.O., Relman, D.A.: Stereotyped and specific gene expression programs in human innate immune responses to bacteria. In: Proc. Natl. Acad. Sci., USA, vol. 99, pp. 972–977 (2002)
Guthke, R., Thies, F., Moeller, U.: Data- and knowledge-driven dynamic modeling of the immune response to bacterial infection. In: Dounias, G.D. (ed.) Hybrid and adaptive computational intelligence in medicine and bio-informatics, EUNITE, pp. 33–39 (2003)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, 65–73 (1998)
Sharan, R., Maron-Katz, A., Shamir, R.: Click and expander: a system for clustering and visualizing gene expression data. Bioinformatics 19, 1787–1799 (2003)
Iyer, V.R., Eisen, M.B., Ross, D.T., Schuler, G., Moore, T., Lee, J.C., Trent, J.M., Staudt, L.M., Hudson, J.J., Boguski, M.S., Lashkari, D., Shalon, D., Botstein, D., Brown, P.O.: The transcriptional program in the response of human fibroblasts to serum. Science 283, 83–87 (1999)
Radke, D., Thies, F., Dvinge, H., Möller, U.: Improved clustering methods for the analysis of microarray data. In: German Conference on Bioinformatics (2003)
Scherf, U., Ross, D.T., Waltham, M., Smith, L.H., Lee, J.K., Tanabe, L., Kohn, K.W., Reinhold, W.C., Myers, T.G., Andrews, D.T., Scudiero, D.A., Eisen, M.B., Sausville, E.A., Pommier, Y., Botstein, D., Brown, P.O., Weinstein, J.N.: A gene expression database for the molecular pharmacology of cancer. Nat. Genet. 24, 236–244 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Radke, D., Möller, U. (2004). Quantitative Evaluation of Established Clustering Methods for Gene Expression Data. In: Barreiro, J.M., Martín-Sánchez, F., Maojo, V., Sanz, F. (eds) Biological and Medical Data Analysis. ISBMDA 2004. Lecture Notes in Computer Science, vol 3337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30547-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-30547-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23964-2
Online ISBN: 978-3-540-30547-7
eBook Packages: Springer Book Archive