Skip to main content

Advertisement

Log in

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets.html.

References

  • Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Exp Syst Appl 42(6):3105–3114

    Article  Google Scholar 

  • Biesiada J, Duch W (2007) Feature selection for high-dimensional data—a pearson redundancy based filter. In: Computer recognition systems, vol 2. Springer, Berlin, Heidelberg, pp 242–249

  • Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

    Article  Google Scholar 

  • Deb K (2001) Multi-objective optimization using evolutionary algorithms, ser. Wiley-Interscience series in systems and optimization. Wiley, Hoboken

    MATH  Google Scholar 

  • Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lect Notes Comput Sci 1917:849–858

    Article  Google Scholar 

  • Deb K, Jain S (2002) Running performance metrics for evolutionary multi-objective optimizations. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL’02), (Singapore), pp 13–20

  • Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Taylor & Francis, pp 32–57

  • Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  • Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1. New York, pp 39–43

  • Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  MATH  Google Scholar 

  • González B, Valdez F, Melin P, Prado-Arechiga G (2015) Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition. Exp Syst Appl 42(14):5839–5847

    Article  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2017) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479

    Article  Google Scholar 

  • Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238

    Article  MathSciNet  Google Scholar 

  • Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  • Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis, ser. Wiley series in probability and statistics. Wiley, Hoboken

    MATH  Google Scholar 

  • Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of conference on system, man, and cybernetics. Citeseer, pp 4104–4109

  • Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556

    Article  MATH  Google Scholar 

  • Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, Berlin

    Book  MATH  Google Scholar 

  • Morita M, Sabourin R, Bortolozzi F, Suen CY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of the seventh international conference on document analysis and recognition, 2003. IEEE, pp 666–670

  • Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part i. IEEE Trans Evolut Comput 18(1):4–19

    Article  Google Scholar 

  • Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybern 46(2):499–510

    Article  Google Scholar 

  • Okabe T, Jin Y, Sendhoff B (2003) A critical survey of performance indices for multi-objective optimisation. In: The 2003 congress on evolutionary computation, CEC’03, vol 2. IEEE, pp 878–885

  • Prakash J, Singh PK (2015) An effective multiobjective approach for hard partitional clustering. Memet Comput 7(2):93–104

    Article  Google Scholar 

  • Rashedi E, Nezamabadi-pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221

    Google Scholar 

  • Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248

    Article  MATH  Google Scholar 

  • Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) Bgsa: binary gravitational search algorithm. Natural Comput 9(3):727–745

    Article  MathSciNet  MATH  Google Scholar 

  • Shams M, Rashedi E, Hakimi A (2015) Clustered-gravitational search algorithm and its application in parameter optimization of a low noise amplifier. Appl Math Comput 258:436–453

    MathSciNet  MATH  Google Scholar 

  • Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549

    Article  Google Scholar 

  • Xu R, Wunsch D et al (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  • Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  • Zhang Y, Gong D-W, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75

    Article  Google Scholar 

  • Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3(4):257–271

    Article  Google Scholar 

  • Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2):173–195

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jay Prakash.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prakash, J., Singh, P.K. Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach. Soft Comput 23, 2083–2100 (2019). https://doi.org/10.1007/s00500-017-2923-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2923-x

Keywords

Navigation