Skip to main content
Log in

A new semi-supervised clustering technique using multi-objective optimization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective simulated annealing based technique, AMOSA. These objective functions are based on some unsupervised and supervised information. First three objective functions represent, respectively, the goodness of the partitioning in terms of Euclidean distance, total symmetry present in the clusters and the cluster connectedness. For the last objective function, we have considered different external cluster validity indices, including adjusted rand index, rand index, a newly developed min-max distance based MMI index, NMMI index and Minkowski Score. Results show that the proposed semi-supervised clustering technique can effectively detect the appropriate number of clusters as well as the appropriate partitioning from the data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Twenty four artificial and five real-life data sets have been used in the evaluation. We develop five different versions of Semi-GenClustMOO clustering technique by varying the external cluster validity indices. Obtained partitioning results are compared with another recently developed multiobjective semi-supervised clustering technique, Mock-Semi. At the end of the paper the effectiveness of the proposed Semi-GenClustMOO clustering technique is shown in segmenting one remote sensing satellite image on the part from the city of Kolkata.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://personalpages.manchester.ac.uk/mbs/julia.handl/

  2. http://personalpages.manchester.ac.uk/mbs/julia.handl/

  3. http://personalpages.manchester.ac.uk/mbs/julia.handl/

References

  1. Alok A, Saha S, Ekbal A (2012) A min-max distance based external cluster validity index: Mmi. In: Hybrid Intelligent Systems (HIS) 2012 12th International Conference on, IEEE

  2. Alok AK, Saha S, Ekbal A (2014) Development of an external cluster validity index using probabilistic approach and min-max distance. IJCISIM 6(1):494–504

    Google Scholar 

  3. Altun Y, Belkin M, Mcallester DA (2005) Maximum margin semi-supervised learning for structured variables. In: Advances in neural information processing systems

  4. Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Bandyopadhyay S (2011) Multiobjective simulated annealing for fuzzy clustering with stability and validity. Systems, Man, and Cybernetics, Part C: Applications and Reviews. IEEE Trans 41(5):682–691

    Google Scholar 

  6. Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recog 35(6):1197–1208

    Article  MATH  Google Scholar 

  7. Bandyopadhyay S, Pal SK (2001) Pixel classification using variable string genetic algorithms with chromosome differentiation. Geoscience and Remote Sensing. IEEE Trans 39(2):303–308

    Google Scholar 

  8. Bandyopadhyay S, Pal SK (2007) Classification and learning using genetic algorithms: applications in bioinformatics and web intelligence. Springer

  9. Bandyopadhyay S, Saha S (2007) Gaps: A clustering method using a new point symmetry-based distance measure. Pattern Recog 40(12):3430–3451

    Article  MATH  Google Scholar 

  10. Bandyopadhyay S, Saha S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. Knowledge and Data Engineering. IEEE Trans 20(11):1441–1457

    Google Scholar 

  11. Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: Amosa. Evolutionary Computation. IEEE Trans 12(3):269–283

    Google Scholar 

  12. Basu S (2003) Semi-supervised clustering: Learning with limited user feedback. PhD thesis, The University of Texas at Austin

    Google Scholar 

  13. Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: In Proceedings of 19th International Conference on Machine Learning ICML-2002, Citeseer

  14. Basu S, Banerjee A, Mooney RJ (2004a) Active semi-supervision for pairwise constrained clustering

  15. Basu S, Bilenko M, Mooney RJ (2004b) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM

  16. Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional Genomics, Springer

  17. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms Kluwer Academic Publishers

  18. Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning, ACM

  19. Bouchachia A, Pedrycz W (2006) Data clustering with partial supervision. Data Min Knowl Discov 12 (1):47–78

    Article  MathSciNet  Google Scholar 

  20. Chapelle O, Zien A (2004) Semi-supervised classification by low density separation. In AI STATS

  21. Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT press Cambridge

  22. Deb K (2001) Multi-objective optimization using evolutionary algorithms, vol 16. John Wiley & Sons

  23. Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms Artificial neural networks in engineering (ANNIE-99)

  24. Dey V, Pratihar DK, Datta GL (2011) Genetic algorithm-tuned entropy-based fuzzy c-means algorithm for obtaining distinct and compact clusters. Fuzzy Optim Decis Making 10(2):153–166

    Article  MathSciNet  Google Scholar 

  25. Ebrahimi J, Abadeh MS (2012) Semi supervised clustering: a pareto approach. In: Machine Learning and Data Mining in Pattern Recognition, Springer, pp 237–251

  26. Everitt B (1974, 1993) Cluster Analysis. Halsted Press

  27. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(2):179–188

    Article  Google Scholar 

  28. Grira N, Crucianu M, Boujemaa N (2004) Unsupervised and semi-supervised clustering: a brief survey

  29. Handl J, Knowles J (2004) Multiobjective clustering with automatic determination of the number of clusters. UMIST, Manchester, Tech Rep TR-COMPSYSBIO-2004-02

  30. Handl J, Knowles J (2006) On semi-supervised clustering via multiobjective optimization. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM

  31. Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. Evolutionary Computation. IEEE Trans 11(1):56–76

    Google Scholar 

  32. Hubert L, Arabie P (1985) Comparing partitions. J classif 2(1):193–218

    Article  Google Scholar 

  33. Kohonen T, 2001 Self-Organizing Maps, vol 30. Springer

  34. Loia V, Pedrycz W, Senatore S (2007) Semantic web content analysis: a study in proximity-based collaborative clustering. Fuzzy Systems. IEEE Trans 15(6):1294–1312

    Google Scholar 

  35. Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. Pattern Analysis and Machine Intelligence. IEEE Trans 24(12):1650– 1654

    Google Scholar 

  36. Pal SK, Mitra S (1994) Fuzzy versions of kohonen’s net and mlp-based classification: performance evaluation for certain nonconvex decision regions. Information Sci 76(3):297– 337

    Article  MATH  Google Scholar 

  37. Richards JA, Richards J (1999) Remote sensing digital image analysis, vol 3. Springer

  38. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  39. Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recog Lett 30(15):1392–1403

    Article  Google Scholar 

  40. Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12 (5):1555–1565

    Article  Google Scholar 

  41. Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Applied Soft Computing 13(1):89–108

    Article  Google Scholar 

  42. Saha S, Ekbal A, Alok AK (2012) Semi-supervised clustering using multiobjective optimization. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on, IEEE

  43. Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans comput 18(5):401–409

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriparna Saha.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alok, A.K., Saha, S. & Ekbal, A. A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43, 633–661 (2015). https://doi.org/10.1007/s10489-015-0656-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0656-z

Keywords

Navigation