Abstract
Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective simulated annealing based technique, AMOSA. These objective functions are based on some unsupervised and supervised information. First three objective functions represent, respectively, the goodness of the partitioning in terms of Euclidean distance, total symmetry present in the clusters and the cluster connectedness. For the last objective function, we have considered different external cluster validity indices, including adjusted rand index, rand index, a newly developed min-max distance based MMI index, NMMI index and Minkowski Score. Results show that the proposed semi-supervised clustering technique can effectively detect the appropriate number of clusters as well as the appropriate partitioning from the data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Twenty four artificial and five real-life data sets have been used in the evaluation. We develop five different versions of Semi-GenClustMOO clustering technique by varying the external cluster validity indices. Obtained partitioning results are compared with another recently developed multiobjective semi-supervised clustering technique, Mock-Semi. At the end of the paper the effectiveness of the proposed Semi-GenClustMOO clustering technique is shown in segmenting one remote sensing satellite image on the part from the city of Kolkata.




















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alok A, Saha S, Ekbal A (2012) A min-max distance based external cluster validity index: Mmi. In: Hybrid Intelligent Systems (HIS) 2012 12th International Conference on, IEEE
Alok AK, Saha S, Ekbal A (2014) Development of an external cluster validity index using probabilistic approach and min-max distance. IJCISIM 6(1):494–504
Altun Y, Belkin M, Mcallester DA (2005) Maximum margin semi-supervised learning for structured variables. In: Advances in neural information processing systems
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bandyopadhyay S (2011) Multiobjective simulated annealing for fuzzy clustering with stability and validity. Systems, Man, and Cybernetics, Part C: Applications and Reviews. IEEE Trans 41(5):682–691
Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recog 35(6):1197–1208
Bandyopadhyay S, Pal SK (2001) Pixel classification using variable string genetic algorithms with chromosome differentiation. Geoscience and Remote Sensing. IEEE Trans 39(2):303–308
Bandyopadhyay S, Pal SK (2007) Classification and learning using genetic algorithms: applications in bioinformatics and web intelligence. Springer
Bandyopadhyay S, Saha S (2007) Gaps: A clustering method using a new point symmetry-based distance measure. Pattern Recog 40(12):3430–3451
Bandyopadhyay S, Saha S (2008) A point symmetry-based clustering technique for automatic evolution of clusters. Knowledge and Data Engineering. IEEE Trans 20(11):1441–1457
Bandyopadhyay S, Saha S, Maulik U, Deb K (2008) A simulated annealing-based multiobjective optimization algorithm: Amosa. Evolutionary Computation. IEEE Trans 12(3):269–283
Basu S (2003) Semi-supervised clustering: Learning with limited user feedback. PhD thesis, The University of Texas at Austin
Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: In Proceedings of 19th International Conference on Machine Learning ICML-2002, Citeseer
Basu S, Banerjee A, Mooney RJ (2004a) Active semi-supervision for pairwise constrained clustering
Basu S, Bilenko M, Mooney RJ (2004b) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
Ben-Hur A, Guyon I (2003) Detecting stable clusters using principal component analysis. In: Functional Genomics, Springer
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms Kluwer Academic Publishers
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning, ACM
Bouchachia A, Pedrycz W (2006) Data clustering with partial supervision. Data Min Knowl Discov 12 (1):47–78
Chapelle O, Zien A (2004) Semi-supervised classification by low density separation. In AI STATS
Chapelle O, Schölkopf B, Zien A et al (2006) Semi-supervised learning, vol 2. MIT press Cambridge
Deb K (2001) Multi-objective optimization using evolutionary algorithms, vol 16. John Wiley & Sons
Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms Artificial neural networks in engineering (ANNIE-99)
Dey V, Pratihar DK, Datta GL (2011) Genetic algorithm-tuned entropy-based fuzzy c-means algorithm for obtaining distinct and compact clusters. Fuzzy Optim Decis Making 10(2):153–166
Ebrahimi J, Abadeh MS (2012) Semi supervised clustering: a pareto approach. In: Machine Learning and Data Mining in Pattern Recognition, Springer, pp 237–251
Everitt B (1974, 1993) Cluster Analysis. Halsted Press
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(2):179–188
Grira N, Crucianu M, Boujemaa N (2004) Unsupervised and semi-supervised clustering: a brief survey
Handl J, Knowles J (2004) Multiobjective clustering with automatic determination of the number of clusters. UMIST, Manchester, Tech Rep TR-COMPSYSBIO-2004-02
Handl J, Knowles J (2006) On semi-supervised clustering via multiobjective optimization. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation, ACM
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. Evolutionary Computation. IEEE Trans 11(1):56–76
Hubert L, Arabie P (1985) Comparing partitions. J classif 2(1):193–218
Kohonen T, 2001 Self-Organizing Maps, vol 30. Springer
Loia V, Pedrycz W, Senatore S (2007) Semantic web content analysis: a study in proximity-based collaborative clustering. Fuzzy Systems. IEEE Trans 15(6):1294–1312
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. Pattern Analysis and Machine Intelligence. IEEE Trans 24(12):1650– 1654
Pal SK, Mitra S (1994) Fuzzy versions of kohonen’s net and mlp-based classification: performance evaluation for certain nonconvex decision regions. Information Sci 76(3):297– 337
Richards JA, Richards J (1999) Remote sensing digital image analysis, vol 3. Springer
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Saha S, Bandyopadhyay S (2009) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recog Lett 30(15):1392–1403
Saha S, Bandyopadhyay S (2012) Some connectivity based cluster validity indices. Appl Soft Comput 12 (5):1555–1565
Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Applied Soft Computing 13(1):89–108
Saha S, Ekbal A, Alok AK (2012) Semi-supervised clustering using multiobjective optimization. In: Hybrid Intelligent Systems (HIS), 2012 12th International Conference on, IEEE
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans comput 18(5):401–409
Acknowledgment
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alok, A.K., Saha, S. & Ekbal, A. A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43, 633–661 (2015). https://doi.org/10.1007/s10489-015-0656-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0656-z