Abstract
This paper proposes a novel approach that selects the number of clusters along with relevant features automatically and simultaneously. Gravitational search algorithm is used as metaheuristic. A novel agent representation scheme is used for encoding cluster centers and number of features. The algorithm is able to find the optimal number of clusters and the relevant features corresponding to the clusters during the run time. A new concept of threshold setting is used. The variance (statistical property) of the dataset has been exploited. To make the search efficient, a novel clustering criterion is used. The proposed approach is compared with recently developed well-known clustering techniques. This approach is further applied for analysis of microarray data. The statistical and biological significance tests are performed to demonstrate the efficiency of proposed approach. The results prove the effectiveness and the accuracy of the proposed algorithm.
Similar content being viewed by others
References
Akarsu E, Karahoca A (2011) Simultaneous feature selection and ant colony clustering. Procedia Comput Sci 3:1432–1438
Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 23(21):2859–2865
Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44:854–865
Blake, CL, Merz CJ (1998) UCI repository of machine learning http:/www.ics.uci.edu/_mlearn/databases/
Cobos C, Leon E, Mendoza M (2010) A harmony search algorithm for clustering with feature selection. Rev Fac Ing Univ Antioquia 55:153–164
Chu S, Derisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699–705
Du L, Shen Y-D (2013) Joint clustering and feature selection. In: International conference on web-age information management, pp 241–252
Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm, IEEE Trans Syst Man Cybern Part A 38:218–237
Das S, Konar A (2009) Automatic image pixel clustering with an improved differential evolution. Appl Soft Comput 9(1):226–236
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Frigui H, Nasraoui O (2000) Simultaneous clustering and attribute discrimination. In: Proceedings of IEEE international conference on fuzzy systems, San Antonio, pp 158–163
Guan Y, Dy JG, Jordan MA (2011) Unifed probabilistic model for global and local unsupervised feature selection. In: Proceedings of the international conference on machine learning, Bellevue, WA
Hollander M, Wolfe DA (1999) Nonparametric statistical methods. Wiley, USA
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson JJ, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO (1999) The transcriptional program in the response of human fibroblasts to serum. Science 283:83–87
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, NJ
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Javani M, Faez K, Aghlmandi D, (2011) Clustering and feature selection via PSO algorithm. In: Artificial intelligence and signal processing, pp 71–76
Kim Y, Street W, Menczer F (2002) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 365–369
Kumar V, Chhabra JK, Kumar D (2014) Clustering using modified harmony search algorithm. Int J Comput Intell Stud 3(2/3):113–133
Kumar V, Chhabra JK, Kumar D (2016) An automated parameter selection approach for simultaneous clustering and feature selection. J Eng Res 4(2):65–85
Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1165
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65:701–709
Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41:1075–1081
Mirjalili S, Mohd Hashim SZ, (2010) A new hybrid PSOGSA algorithm for function optimization. In: IEEE international conference on computer and information application, pp 374–377
Naik A, Satapathy SC (2014) Efficient clustering of dataset based on differential evolution. In: Satapathy SC, Udgata SK, Biswal BN (eds) Advances in intelligent systems and computing, Springer, pp 217–227
Qin ZS (2006) Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 22:1988–1997
Rashedi E, Pour HN, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179:2232–2248
Rashedi E, Pour HN, Saryazdi S (2010) BGSA: binary gravitational search algorithm. Nat Comput 9:727–745
Roth RV, Lange T (2004) Feature selection in clustering problems. In: Proceedings of advances in neural information processing systems, Cambridge
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Saha S, Ekbal A, Gupta K, Bandyopadhyay S (2013) Gene expression data clustering using a multiobjective symmetry based clustering technique. Comput Biol Med 43:1965–1977
Sarvari H, Khairdoost N, Fetanat A (2010) Harmony search algorithm for simultaneous clustering and feature selection. In: International conference of soft computing and pattern recognition, Paris, pp 202–207
Sheng W, Liu X, Fairhurst M (2008) A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Trans Knowl Data Eng 20(7):868–879
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Swetha KP, Devi VS (2012) Simultaneous feature selection and clustering using particle swarm optimization. In: International conference on neural information processing, Doha, Qatar, pp 509–515
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:2907–2912
Vaithyanathan S, Dom B (1999) Generalized model selection for unsupervised learning in high dimensions. In: Advances in neural information processing systems, 12, Cambridge, pp 970–976,
Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 95:334–339
Zeng H, Cheung Y-M (2009) A new feature selection method for gaussian mixture clustering. Pattern Recogn 42:243–250
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare the absence of conflict of interest.
Rights and permissions
About this article
Cite this article
Kumar, V., Kumar, D. Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis. Neural Comput & Applic 31, 3647–3663 (2019). https://doi.org/10.1007/s00521-017-3321-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3321-0