Abstract
When gene expression datasets contain some labeled data samples, the labeled information should be incorporated into clustering algorithm such that more reasonable clustering results can be achieved. In this paper, a novel semi-supervised clustering algorithm, Semi-supervised Iterative Visual Clustering Algorithm (Semi-IVCA), is presented to tackle with such datasets. The new algorithm first constructs the visual sampling image of the dataset based on visual theorem and obtains its attractors using the gradient learning rules, where each attractor denotes a cluster of the dataset. Then the new algorithm introduces an iterative clustering procedure to realize the semi-supervised learning. The new algorithm is a generalization of the current Visual Clustering Algorithm (VCA) presented by authors. Except for the advantage that Semi-IVCA can effectively utilize the labeled data information in clustering, it is robust and insensitive to initialization, and it has strong parameter learning capability and good interpretation for the clustering results. When the new algorithm Semi-IVCA is applied to the artificial and real gene expression datasets, the experimental results confirm the above advantages of algorithm Semi-IVCA.
Similar content being viewed by others
References
Lockhart DJ, Winzeler EA (2000) Genomics, gene expression and DNA arrays. Nature 405:827–836
Shalon D, Smith SJ, Brown PO (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6:639–645
Young RA (2000) Biomedical discovery with DNA arrays. Cell 102:9–15
Jain BK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
Zaït M, Messatfa H (1997) A comparative study of clustering methods. Future Gen Comput Syst 13:149–159
Jain K, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323
Morgan JT, Ray APG (1995) Non-uniqueness and inversions in cluster analysis. Appl Stat 44:117–134
Yang MS, Wu KL (2004) A similarity-based robust clustering method. IEEE Trans Pattern Anal Mach Intell 26:434–448
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Nat Acad Sci USA 96:2907–2912
Lukashin AV, Rainer F (2001) Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17:405–414
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Nat Acad Sci USA 96:6745–6750
Wu SH, Liew WC, Yan H, Yang MS (2004) Clustering analysis of gene expression data based on self-splitting and merging competitive learning. IEEE Trans Inf Biomed 8:5–15
Qu Y, Xu SZ (2004) Supervised cluster analysis for microarray data based on multivariate on Gaussian mixture. Bioinformatics 20:1905–1913
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97:262–267
Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovizky G (2002) Systemic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genom Res 12:1703–1715
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319
Scholkopf B, Smola A (2001) Learning with kernels—support vector machines, regularization, optimization and beyond. MIT Press, Cambridge
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: scholkopf BL, Burges C, Smola A (eds) Advances in kernel methods – Support vector learning. MIT Press, Cambridge, pp 68–88
Lin CJ (2001) Formulations of support vector machines: A note from an optimization point of view. Neural Comput 13:337–317
Wahba G (1999) Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: scholkopf BL, Burges C, Smola A (eds) Advances in kernel methods— Support vector learning. MIT Press, Cambridge, pp 68–88
Chen JH, Chen CS (2002) Fuzzy Kernel Perceptron. IEEE Trans Neural Netw 13:1364–1373
Marr D (1982) Vision, a computational investigation into the human representation. W H Freeman, San Francisco
Gene Expression Dataset, http://rana. lbl. gov/EisenData. htm
Bloch KM et al. (2003) Median correlation for the analysis of gene expression data. Signal Process 83:811–823
Cao YQ, Wu JH (2002) Projective ART for clustering data sets in high dimensional spaces. Neural Netw 15:105–120
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of the 27th annual international conference on research and development in information retrieval. Sheffield, United Kingdom, pp 218–225
Papadopoulos D, Domeniconi C, Gunopulos D, Ma S (2003) DB integration: clustering gene expression data in SQL using locally adaptive metrics. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. San Diego, California, pp 35–41
Yu LTH, Chung FL, Chan SCF, Yuen SMC (2004) Using emerging pattern based projected clustering and gene expression data for cancer detection. In: Proceedings of the second conference on Asia-Pacific bioinformatics. Dunedin, New Zealand, pp 75–84
Wang S et al Visual Sampling Clustering Approach VSC, Chin J Electronics Inf (accepted)
Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner RE, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 292:929–934
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Yeung KY, Medvedovic M, Bumgarner R (2003) Clustering gene expression data with repeated measurements. Genome Biol 4(5):R34
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chung, Fl., Wang, S., Deng, Z. et al. Clustering Analysis of Gene Expression Data based on Semi-supervised Visual Clustering Algorithm. Soft Comput 10, 981–993 (2006). https://doi.org/10.1007/s00500-005-0025-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0025-7