Abstract
In this paper, at first a new point symmetry-based similarity measurement is proposed which satisfies the closure and the symmetry properties of any distance function. The different desirable properties of the new distance are elaborately explained. Thereafter a new clustering algorithm based on the search capability of genetic algorithm is developed where the newly developed point symmetry-based distance is used for cluster assignment. The allocation of points to different clusters is performed in such a way that the closure property is satisfied. The proposed GA with newly developed point symmetry distance based (GAnPS) clustering algorithm is capable of determining different symmetrical shaped clusters having any sizes or convexities. The effectiveness of the proposed GAnPS clustering technique in identifying the proper partitioning is shown for twenty-one data sets having various characteristics. Performance of GAnPS is compared with existing symmetry-based genetic clustering technique, GAPS, three popular and well-known clustering techniques, K-means, expectation maximization and average linkage algorithm. In a part of the paper, the utility of the proposed clustering technique is shown for partitioning a remote sensing satellite image. The last part of the paper deals with the development of some automatic clustering techniques using the newly proposed symmetry-based distance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alander JT (1992) On optimal population size of genetic algorithms. In: Proceedings of computer systems and software engineering, CompEuro ’92, The Hague , Netherlands, pp 65–70
Alok AK, Saha S, Ekbal A (2015) A new semi-supervised clustering technique using multi-objective optimization. Appl Intell 43(3):633–661
Anderberg MR (2000) Computational geometry: algorithms and applications. Springer, Heidelberg
Bandyopadhyay S, Maulik U (2001) Nonparametric genetic clustering: comparison of validity indices. IEEE Trans Syst Man Cybern 31(1):120–125
Bandyopadhyay S, Maulik U (2002) Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognit 35(6):1197–1208
Bandyopadhyay S, Saha S (2007) GAPS: a clustering method using a new point symmetry based distance measure. Pattern Recognit 40(12):3430–3451
Bandyopadhyay S, Saha S (2008) A point symmetry based clustering technique for automatic evolution of clusters. IEEE Trans Knowl Data Eng 20(11):1–17
Bandyopadhyay S, Saha S (2013) Unsupervised classification–similarity measures, classical and metaheuristic approaches, and applications. Springer, Berlin
Bentley JL, Weide BW, Yao AC (1980) Optimal expected-time algorithms for closest point problems. ACM Trans Math Softw 6(4):563–580
Bezdek JC (1973) Fuzzy mathematics in pattern classification. PhD thesis, Cornell University, Ithaca, NY
Bong CW, Rajeswari M (2012) Multiobjective clustering with metaheuristic: current trends and methods in image segmentation. Image Process IET 6:1–10
Chou C-H, Su M-C, Lai E (2002) Symmetry as a new measure for cluster validity. In: 2nd WSEAS international conference on scientific computation and soft computing, pp 209–213
Chung K-L, Lin J-S (2007) Faster and more robust point symmetry-based K-means algorithm. Pattern Recognit 40(2):410–422
Deb K, Agrawal S (1998) Understanding interactions among genetic algorithm parameters. In: In foundations of genetic algorithms 5, pp 265–286, Morgan Kaufmann
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Everitt BS, Landau S, Leese M (2001) Cluster analysis. Arnold, London
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 3:179–188
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Friedman JH, Bently JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226
Furutani H, Sakamoto M, Katayama S (2005) Influence of finite population size–extinction of favorable schemata. ICNC 2:1025–1034
Furutani H, Fujimaru T, Zhang Y-A, Sakamoto M (2007) Effects of population size on computational performance of genetic algorithm on multiplicative landscape. In: Proceedings of the third international conference on natural computation, vol 03, ICNC ’07, Washington, DC, USA, pp 488–496, IEEE Computer Society
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data setsfor all pairwise comparisons. J Mach Learn Res 9:2677–2694
Garcia-Piquer A, Fornells A, Bacardit J, Orriols-Puig A, Golobardes E (2014) Large-scale experimental evaluation of cluster representations for multiobjective evolutionary clustering. IEEE Trans Evol Comput 18:36–53
Goldberg DE (1989a) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York
Goldberg DE (1989b) Sizing populations for serial and parallel genetic algorithms. In: Proceedings of the third international conference on Genetic algorithms, San Francisco, CA, USA, pp 70–79, Morgan Kaufmann Publishers Inc
Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in gas. In: Foundations of GAs (FOGA), pp 69–93
Goldberg DE, Deb K, Clark JH (1992) Genetic algorithms, noise, and the sizing of populations. Complex Syst 6:333–362
Goldberg DE, Kargupta H, Horn J, Cantu-Paz E (1995) Critical deme size for serial and parallel genetic algorithms, tech. rep., The Illinois GA Lab, University of Illinois, IlliGAL. Report 95002
Grefenstette J (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16:122–128
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update; SIGKDD explorations. IEEE Trans Pattern Anal Mach Intell 11(1):10–18
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76
Handl J, Knowles J (2013) Evidence accumulation in multiobjective data clustering. In: Purshouse R, Fleming P, Fonseca C, Greco S, Shaw J (eds) Evolutionary multi-criterion optimization, vol 7811., Lecture Notes in Computer ScienceBerlin, Springer, pp 543–557
Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jardine N, Sibson R (1971) Mathematical taxonomy. Wiley, New York
Lobo FG, Goldberg DE (2004) The parameter-less genetic algorithm in practice. Inf Sci 167(1–4):217–232. doi:10.1016/j.ins.2003.03.029
Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41(5):1075–1081
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm Evolut Comput 16:1–18
Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis
Pal P, Chanda B (2002) A symmetry based clustering technique for multi-spectral satellite imagery. In: ICVGIP
Richards JA (1993) Remote sensing digital image analysis: an introduction. Springer, New York
Saha S, Bandyopadhyay S (2008) Application of a new symmetry based cluster validity index for satellite image segmentation. IEEE Geosci Remote Sens Lett 5(2):166–170
Saha S, Bandyopadhyay S (2009a) A new multiobjective simulated annealing based clustering technique using symmetry. Pattern Recognit Lett 30(15):1392–1403
Saha S, Bandyopadhyay S (2009b) A new line symmetry distance and its application to data clustering. J Comput Sci Technol 24(3):544–556
Saha S, Bandyopadhyay S (2010a) A symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognit 43(3):738–751
Saha S, Bandyopadhyay S (2010b) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23(1):1–27
Saha S, Bandyopadhyay S (2011) On principle axis based line symmetry clustering techniques. Memet Comput 3(2):129–144
Saha S, Bandyopadhyay S (2013) A generalized automatic clustering algorithm in a multiobjective framework. Appl Soft Comput 13:89–108
Saha S, Maulik U (2011) A new line symmetry distance based automatic clustering technique: application to image segmentation. Int J Imaging Syst Technol 21(1):86–100
Saha S, Spandana R, Ekbal A, Bandyopadhyay S (2015) Simultaneous feature selection and symmetry based clustering using multiobjective framework. Appl Soft Comput 29:479–486
Sheng W, Swift S, Zhang L, Liu X (2005) A weighted sum validity function for clustering with a hybrid niching genetic algorithm. IEEE Trans Syst Man Cybern Part B Cybern 35(6):56–67
Srinivas M, Patnaik LM (1994) Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Trans Syst Man Cybern 24(4):656–667
Su M-C, Chou C-H (2001) A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans Pattern Anal Mach Intell 23(6):674–680
Zabrodsky H, Peleg S, Avnir D (1995) Symmetry as a continuous feature. IEEE Trans Pattern Anal Mach Intell 17(12):1154–1166
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author does not have any conflict of interest with the journal.
Additional information
Communicated by A. Di Nola.
Rights and permissions
About this article
Cite this article
Saha, S. Enhancing point symmetry-based distance for data clustering. Soft Comput 22, 409–436 (2018). https://doi.org/10.1007/s00500-016-2477-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2477-3