Abstract
In this paper, we use the Fuzzy C-means method for clustering 3-way gene expression data via optimization of multiple objectives. A reformulation of the total clustering criterion is used to obtain an expression which has fewer variables compared to the classical FCM criterion. This transformation allows the use of a direct global optimizer in constrast to the alternating search commonly used. Gene expression data from microarray technology is generally of high dimension. The problem of empty space is known for this kind of data. We propose in this paper a transformation allowing more contrast in distances between all pairs of data samples. This, hence, increases the likelihood of detecting group structure, if any, in high dimensional datasets.
Similar content being viewed by others
References
Abou-Sleymane G, Chalmel F, Helmlinger D et al (2006) Polyglutamine expansion causes neurodegeneration by alterning the neuronal differentiation program. Hum Mol Genet 15(5): 691–703
Alon U, Barkai N, Notterman DA et al (1999) Broad value patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12): 6745–6750
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57: 289–300
Beyer K, Goldstein J, Ramakrishnan R et al (1999) When is “nearest neighbor” meaningful?. In: Beeri C, Buneman P (eds) LNCS 1540. Springer, Berlin, pp 217–235
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Dembélé D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8): 973–980
Dennis G Jr, Sherman BT, Hosack DA et al (2003) DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 4(9): R60
Dohono DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. In: Proceedings of American mathematical society conference “math challenges of the 21st century”, Los Angeles, http://www-stat.stanford.edu/~donoho
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537
Gröll L, Jäkel J (2005) A new convergence proof of fuzzy C-means. IEEE T Fuzzy Syst 13(5): 717–720
Hathaway RJ, Bezdek JC (1995) Optimization of clustering criteria by reformulation. IEEE T Fuzzy Syst 3(2): 241–245
Hérault J, Guérin-Dugué A, Villemain P (2002) Searching for the embedded manifolds in high-dimensional data, problems and unsolved questions, In: SANN’2002 proceedings—European symposium on artificial neural networks, 24–26 April, Bruges, pp 173–184
Höppner F, Klawonn F (2003) A contribution to convergence theory of fuzzy C-means and derivatives. IEEE T Fuzzy Syst 11(5): 682–694
Irizarry RA, Bolstad BM, Collin F et al (2003) Summaries of affymetrix geneChip probe level data. Nucleic Acids Res 31(4): e15
Jimenez JO, Landgrebe D (1995) High dimension feature reduction via projection pursuit, technical report TR-ECE 96-5, School of Electrical and Computer Engineering, Purdue University
Michalewicz Z (1998) Genetic algorithms + data structures = evolution programs, 3rd revised and extended edn. Springer, Heidelberg
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2): 159–179
Sato M, Sato Y, Jain LC (1997) Fuzzy Clustering Models and Applications. Physica-Verlag
Sharan R, Shamir R (2000) CLICK: a clustering algorithm with application to gene expression analysis. In: Proceedings of the AAAI: ISMB, pp 307–316
Tamayo P, Slonim D, Mesirov J et al (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 2907–2912
Wicker N, Dembele D, Raffelsberger W et al (2002) Density of points clustering, application to transcriptomic data analysis. Nucleic Acids Res 30(18): 3992–4000
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dembélé, D. Multi-objective optimization for clustering 3-way gene expression data. Adv Data Anal Classif 2, 211–225 (2008). https://doi.org/10.1007/s11634-008-0032-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-008-0032-5