Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

Prakash, Jay; Singh, Pramod Kumar

doi:10.1007/s00500-017-2923-x

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

Methodologies and Application
Published: 24 November 2017

Volume 23, pages 2083–2100, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jay Prakash¹ &
Pramod Kumar Singh¹

614 Accesses
21 Citations
Explore all metrics

Abstract

Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utilizing the advantages of both global and local search strategies for finding a small subset of features in a two-stage method

Article 16 March 2018

Feature Subset Selection Approach by Gray-Wolf Optimization

Feature Selection Algorithm Based on Multi Strategy Grey Wolf Optimizer

Notes

http://archive.ics.uci.edu/ml/datasets.html.

References

Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Exp Syst Appl 42(6):3105–3114
Article Google Scholar
Biesiada J, Duch W (2007) Feature selection for high-dimensional data—a pearson redundancy based filter. In: Computer recognition systems, vol 2. Springer, Berlin, Heidelberg, pp 242–249
Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279
Article Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Article Google Scholar
Deb K (2001) Multi-objective optimization using evolutionary algorithms, ser. Wiley-Interscience series in systems and optimization. Wiley, Hoboken
MATH Google Scholar
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. Lect Notes Comput Sci 1917:849–858
Article Google Scholar
Deb K, Jain S (2002) Running performance metrics for evolutionary multi-objective optimizations. In: Proceedings of the 4th Asia-Pacific conference on simulated evolution and learning (SEAL’02), (Singapore), pp 13–20
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Taylor & Francis, pp 32–57
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
MathSciNet MATH Google Scholar
Eberhart RC, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, vol 1. New York, pp 39–43
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article MATH Google Scholar
González B, Valdez F, Melin P, Prado-Arechiga G (2015) Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition. Exp Syst Appl 42(14):5839–5847
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2017) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479
Article Google Scholar
Handl J, Knowles J (2006) Feature subset selection in unsupervised learning via multiobjective optimization. Int J Comput Intell Res 2(3):217–238
Article MathSciNet Google Scholar
Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Article Google Scholar
Kaufman L, Rousseeuw P (2009) Finding groups in data: an introduction to cluster analysis, ser. Wiley series in probability and statistics. Wiley, Hoboken
MATH Google Scholar
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of conference on system, man, and cybernetics. Citeseer, pp 4104–4109
Kim Y, Street WN, Menczer F (2002) Evolutionary model selection in unsupervised learning. Intell Data Anal 6(6):531–556
Article MATH Google Scholar
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Springer, Berlin
Book MATH Google Scholar
Morita M, Sabourin R, Bortolozzi F, Suen CY (2003) Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceedings of the seventh international conference on document analysis and recognition, 2003. IEEE, pp 666–670
Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello Coello C (2014) A survey of multiobjective evolutionary algorithms for data mining: part i. IEEE Trans Evolut Comput 18(1):4–19
Article Google Scholar
Nag K, Pal NR (2016) A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans Cybern 46(2):499–510
Article Google Scholar
Okabe T, Jin Y, Sendhoff B (2003) A critical survey of performance indices for multi-objective optimisation. In: The 2003 congress on evolutionary computation, CEC’03, vol 2. IEEE, pp 878–885
Prakash J, Singh PK (2015) An effective multiobjective approach for hard partitional clustering. Memet Comput 7(2):93–104
Article Google Scholar
Rashedi E, Nezamabadi-pour H (2014) Feature subset selection using improved binary gravitational search algorithm. J Intell Fuzzy Syst 26(3):1211–1221
Google Scholar
Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248
Article MATH Google Scholar
Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) Bgsa: binary gravitational search algorithm. Natural Comput 9(3):727–745
Article MathSciNet MATH Google Scholar
Shams M, Rashedi E, Hakimi A (2015) Clustered-gravitational search algorithm and its application in parameter optimization of a low noise amplifier. Appl Math Comput 258:436–453
MathSciNet MATH Google Scholar
Sikdar UK, Ekbal A, Saha S (2015) Mode: multiobjective differential evolution for feature selection and classifier ensemble. Soft Comput 19(12):3529–3549
Article Google Scholar
Xu R, Wunsch D et al (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
MathSciNet MATH Google Scholar
Zhang Y, Gong D-W, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75
Article Google Scholar
Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evolut Comput 3(4):257–271
Article Google Scholar
Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2):173–195
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence and Data Mining Research Laboratory, ABV - Indian Institute of Information technology and Management Gwalior, MP, India
Jay Prakash & Pramod Kumar Singh

Authors

Jay Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Pramod Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jay Prakash.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prakash, J., Singh, P.K. Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach. Soft Comput 23, 2083–2100 (2019). https://doi.org/10.1007/s00500-017-2923-x

Download citation

Published: 24 November 2017
Issue Date: 29 March 2019
DOI: https://doi.org/10.1007/s00500-017-2923-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

Abstract

Access this article

Similar content being viewed by others

Utilizing the advantages of both global and local search strategies for finding a small subset of features in a two-stage method

Feature Subset Selection Approach by Gray-Wolf Optimization

Feature Selection Algorithm Based on Multi Strategy Grey Wolf Optimizer

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

Abstract

Access this article

Similar content being viewed by others

Utilizing the advantages of both global and local search strategies for finding a small subset of features in a two-stage method

Feature Subset Selection Approach by Gray-Wolf Optimization

Feature Selection Algorithm Based on Multi Strategy Grey Wolf Optimizer

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation