Skip to main content
Log in

Clustering Analysis of Gene Expression Data based on Semi-supervised Visual Clustering Algorithm

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

When gene expression datasets contain some labeled data samples, the labeled information should be incorporated into clustering algorithm such that more reasonable clustering results can be achieved. In this paper, a novel semi-supervised clustering algorithm, Semi-supervised Iterative Visual Clustering Algorithm (Semi-IVCA), is presented to tackle with such datasets. The new algorithm first constructs the visual sampling image of the dataset based on visual theorem and obtains its attractors using the gradient learning rules, where each attractor denotes a cluster of the dataset. Then the new algorithm introduces an iterative clustering procedure to realize the semi-supervised learning. The new algorithm is a generalization of the current Visual Clustering Algorithm (VCA) presented by authors. Except for the advantage that Semi-IVCA can effectively utilize the labeled data information in clustering, it is robust and insensitive to initialization, and it has strong parameter learning capability and good interpretation for the clustering results. When the new algorithm Semi-IVCA is applied to the artificial and real gene expression datasets, the experimental results confirm the above advantages of algorithm Semi-IVCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Lockhart DJ, Winzeler EA (2000) Genomics, gene expression and DNA arrays. Nature 405:827–836

    Article  Google Scholar 

  2. Shalon D, Smith SJ, Brown PO (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res 6:639–645

    Article  Google Scholar 

  3. Young RA (2000) Biomedical discovery with DNA arrays. Cell 102:9–15

    Article  Google Scholar 

  4. Jain BK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  5. Zaït M, Messatfa H (1997) A comparative study of clustering methods. Future Gen Comput Syst 13:149–159

    Article  Google Scholar 

  6. Jain K, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323

    Article  Google Scholar 

  7. Morgan JT, Ray APG (1995) Non-uniqueness and inversions in cluster analysis. Appl Stat 44:117–134

    Article  MATH  Google Scholar 

  8. Yang MS, Wu KL (2004) A similarity-based robust clustering method. IEEE Trans Pattern Anal Mach Intell 26:434–448

    Article  Google Scholar 

  9. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Nat Acad Sci USA 96:2907–2912

    Article  Google Scholar 

  10. Lukashin AV, Rainer F (2001) Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics 17:405–414

    Article  Google Scholar 

  11. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Nat Acad Sci USA 96:6745–6750

    Article  Google Scholar 

  12. Wu SH, Liew WC, Yan H, Yang MS (2004) Clustering analysis of gene expression data based on self-splitting and merging competitive learning. IEEE Trans Inf Biomed 8:5–15

    Article  Google Scholar 

  13. Qu Y, Xu SZ (2004) Supervised cluster analysis for microarray data based on multivariate on Gaussian mixture. Bioinformatics 20:1905–1913

    Article  Google Scholar 

  14. Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97:262–267

    Article  Google Scholar 

  15. Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovizky G (2002) Systemic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genom Res 12:1703–1715

    Article  Google Scholar 

  16. Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319

    Article  Google Scholar 

  17. Scholkopf B, Smola A (2001) Learning with kernels—support vector machines, regularization, optimization and beyond. MIT Press, Cambridge

    Google Scholar 

  18. Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: scholkopf BL, Burges C, Smola A (eds) Advances in kernel methods – Support vector learning. MIT Press, Cambridge, pp 68–88

    Google Scholar 

  19. Lin CJ (2001) Formulations of support vector machines: A note from an optimization point of view. Neural Comput 13:337–317

    Article  Google Scholar 

  20. Wahba G (1999) Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: scholkopf BL, Burges C, Smola A (eds) Advances in kernel methods— Support vector learning. MIT Press, Cambridge, pp 68–88

    Google Scholar 

  21. Chen JH, Chen CS (2002) Fuzzy Kernel Perceptron. IEEE Trans Neural Netw 13:1364–1373

    Article  Google Scholar 

  22. Marr D (1982) Vision, a computational investigation into the human representation. W H Freeman, San Francisco

    Google Scholar 

  23. Gene Expression Dataset, http://rana. lbl. gov/EisenData. htm

  24. Bloch KM et al. (2003) Median correlation for the analysis of gene expression data. Signal Process 83:811–823

    Article  Google Scholar 

  25. Cao YQ, Wu JH (2002) Projective ART for clustering data sets in high dimensional spaces. Neural Netw 15:105–120

    Article  Google Scholar 

  26. Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of the 27th annual international conference on research and development in information retrieval. Sheffield, United Kingdom, pp 218–225

  27. Papadopoulos D, Domeniconi C, Gunopulos D, Ma S (2003) DB integration: clustering gene expression data in SQL using locally adaptive metrics. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. San Diego, California, pp 35–41

  28. Yu LTH, Chung FL, Chan SCF, Yuen SMC (2004) Using emerging pattern based projected clustering and gene expression data for cancer detection. In: Proceedings of the second conference on Asia-Pacific bioinformatics. Dunedin, New Zealand, pp 75–84

  29. Wang S et al Visual Sampling Clustering Approach VSC, Chin J Electronics Inf (accepted)

  30. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner RE, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 292:929–934

    Article  Google Scholar 

  31. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

    Article  Google Scholar 

  32. Yeung KY, Medvedovic M, Bumgarner R (2003) Clustering gene expression data with repeated measurements. Genome Biol 4(5):R34

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shitong Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, Fl., Wang, S., Deng, Z. et al. Clustering Analysis of Gene Expression Data based on Semi-supervised Visual Clustering Algorithm. Soft Comput 10, 981–993 (2006). https://doi.org/10.1007/s00500-005-0025-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0025-7

Keywords

Navigation