Skip to main content
Log in

Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In the domain of bioinformatics, the clustering of gene expression profiles of different tissue samples over different experimental conditions has gained importance with the invention of micro-array based technology. This study also has some impact on cancer diagnosis. The proper classification of cancer tissue samples generated using the micro-array technology helps in detecting cancers in an automated way. In the current paper we have developed a semi-supervised clustering technique for proper partitioning of these gene expression data sets. Semi-supervised clustering is a combination of unsupervised and supervised classification techniques. It uses some amount of supervised information and a large collection of unsupervised data. Here a multi-objective based semi-supervised clustering technique is developed for solving the cancer tissue classification problem. Different combinations of objective functions are used. As the supervised information we assume that class labels of 10 % data are available. The proposed technique is evaluated for three open source benchmark cancer data sets (brain tumor data set, adult malignancy and small round blood cell tumors). Two classification quality measures, viz., Adjusted Rand Index and Classification Accuracy are used to measure the goodness of the obtained partitionings. Obtained results are compared with several state-of-the-art clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://algorithmics.molgen.mpg.de/Static/Supplements/.

  2. http://algorithmics.molgen.mpg.de/Static/Supplements/.

  3. http://www.ailab.si/supp/bi-cancer/projections/info/SRBCT.htm.

Abbreviations

SOO:

Single objective optimization

MOO:

Multi-objective optimization

AMOSA:

Archived multi-objective simulated annealing based technique

SA:

Simulated annealing

FCM:

Fuzzy C-means

MOGA:

Multi-objective genetic algorithm

References

  • Acharya S, Saha S, Thadisina Y (2015) Multiobjective simulated annealing based clustering of tissue samples for cancer diagnosis. IEEE J Biomed Health Inform. doi:10.1109/JBHI.2015.2404971

  • Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511

    Article  Google Scholar 

  • Altun Y, McAllester D, Belkin M (2006) Maximum margin semi-supervised learning for structured variables. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems 18. MIT Press, Cambridge, pp 33–40

    Google Scholar 

  • An L, Doerge RW (2012) Dynamic clustering of gene expression. ISRN Bioinform 2012(Article ID 537217):12 pages

  • Bandyopadhyay S, Saha S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1441–1457

    Article  Google Scholar 

  • Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 23(21):2859–2865

    Article  Google Scholar 

  • Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: AMOSA. IEEE Trans Evol Comput 12(3):269–283

    Article  Google Scholar 

  • Basu S, Banjeree A, Mooney E, Banerjee A, Mooney RJ (2004) Active semi-supervision for pairwise constrained clustering. In: Proceedings of the 2004 SIAM international conference on data mining (SDM-04), pp 333–344

  • Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. Methods Mol Biol 224:159–182. http://view.ncbi.nlm.nih.gov/pubmed/12710673

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on machine learning, ACM, pp 81–88

  • Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: Cowell R, Ghahramani Z (eds) Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 57–64. http://eprints.pascal-network.org/archive/00000388/

  • Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. Adaptive computation and machine learning. MIT Press, Cambridge

    Book  Google Scholar 

  • de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinform 9. http://dblp.uni-trier.de/db/journals/bmcbi/bmcbi9.html#SoutoCALS08

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197. doi:10.1109/4235.996017

    Article  Google Scholar 

  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537

    Article  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654

    Article  Google Scholar 

  • Mukhopadhyay A, Bandyopadhyay S, Maulik U (2010) Multi-class clustering of cancer subtypes through SVM based ensemble of Pareto-optimal solutions for gene marker identification. PLoS One 5(11):e13803. doi:10.1371/journal.pone.0013803

  • Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13(1):89–108

    Article  Google Scholar 

  • Saha S, Ekbal A, Alok AK (2012) Semi-supervised clustering using multiobjective optimization. In: 2th International Conference on hybrid intelligent systems (HIS), 2012, IEEE, pp 360–365

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. doi:10.1162/153244303321897735

  • Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:2907–2912

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  • Wang Y, Pan Y (2014) Semi-supervised consensus clustering for gene expression data analysis. BioData Min 7(1):1–13

    Article  Google Scholar 

  • Yeung K, Bumgarner R (2003) Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 4(12):R83

    Article  Google Scholar 

  • Yeung K, Ruzzo W (2001) An empirical study on principal component analysis for clustering gene expression data. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.8391

Download references

Compliance with ethical standards

We hereby declare that we have not submitted the current paper elsewhere.We have followed the ethical standards of the current journal.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Additional information

Communicated by S. Deb, T. Hanne and S. Fong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S., Kaushik, K., Alok, A.K. et al. Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis. Soft Comput 20, 3381–3392 (2016). https://doi.org/10.1007/s00500-015-1783-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1783-5

Keywords

Navigation