Skip to main content
Log in

Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a novel approach that selects the number of clusters along with relevant features automatically and simultaneously. Gravitational search algorithm is used as metaheuristic. A novel agent representation scheme is used for encoding cluster centers and number of features. The algorithm is able to find the optimal number of clusters and the relevant features corresponding to the clusters during the run time. A new concept of threshold setting is used. The variance (statistical property) of the dataset has been exploited. To make the search efficient, a novel clustering criterion is used. The proposed approach is compared with recently developed well-known clustering techniques. This approach is further applied for analysis of microarray data. The statistical and biological significance tests are performed to demonstrate the efficiency of proposed approach. The results prove the effectiveness and the accuracy of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Akarsu E, Karahoca A (2011) Simultaneous feature selection and ant colony clustering. Procedia Comput Sci 3:1432–1438

    Article  Google Scholar 

  2. Bandyopadhyay S, Mukhopadhyay A, Maulik U (2007) An improved algorithm for clustering gene expression data. Bioinformatics 23(21):2859–2865

    Article  Google Scholar 

  3. Breaban M, Luchian H (2011) A unifying criterion for unsupervised clustering and feature selection. Pattern Recogn 44:854–865

    Article  Google Scholar 

  4. Blake, CL, Merz CJ (1998) UCI repository of machine learning http:/www.ics.uci.edu/_mlearn/databases/

  5. Cobos C, Leon E, Mendoza M (2010) A harmony search algorithm for clustering with feature selection. Rev Fac Ing Univ Antioquia 55:153–164

    Google Scholar 

  6. Chu S, Derisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699–705

    Article  Google Scholar 

  7. Du L, Shen Y-D (2013) Joint clustering and feature selection. In: International conference on web-age information management, pp 241–252

  8. Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm, IEEE Trans Syst Man Cybern Part A 38:218–237

    Article  Google Scholar 

  9. Das S, Konar A (2009) Automatic image pixel clustering with an improved differential evolution. Appl Soft Comput 9(1):226–236

    Article  Google Scholar 

  10. Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889

    MathSciNet  MATH  Google Scholar 

  11. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868

    Article  Google Scholar 

  12. Frigui H, Nasraoui O (2000) Simultaneous clustering and attribute discrimination. In: Proceedings of IEEE international conference on fuzzy systems, San Antonio, pp 158–163

  13. Guan Y, Dy JG, Jordan MA (2011) Unifed probabilistic model for global and local unsupervised feature selection. In: Proceedings of the international conference on machine learning, Bellevue, WA

  14. Hollander M, Wolfe DA (1999) Nonparametric statistical methods. Wiley, USA

    MATH  Google Scholar 

  15. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JC, Trent JM, Staudt LM, Hudson JJ, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO (1999) The transcriptional program in the response of human fibroblasts to serum. Science 283:83–87

    Article  Google Scholar 

  16. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, NJ

    MATH  Google Scholar 

  17. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  18. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  19. Javani M, Faez K, Aghlmandi D, (2011) Clustering and feature selection via PSO algorithm. In: Artificial intelligence and signal processing, pp 71–76

  20. Kim Y, Street W, Menczer F (2002) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 365–369

  21. Kumar V, Chhabra JK, Kumar D (2014) Clustering using modified harmony search algorithm. Int J Comput Intell Stud 3(2/3):113–133

    Article  Google Scholar 

  22. Kumar V, Chhabra JK, Kumar D (2016) An automated parameter selection approach for simultaneous clustering and feature selection. J Eng Res 4(2):65–85

    Article  Google Scholar 

  23. Law MHC, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1165

    Article  Google Scholar 

  24. Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with gaussian mixture models. Biometrics 65:701–709

    Article  MathSciNet  MATH  Google Scholar 

  25. Maulik U, Bandyopadhyay S (2003) Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification. IEEE Trans Geosci Remote Sens 41:1075–1081

    Article  Google Scholar 

  26. Mirjalili S, Mohd Hashim SZ, (2010) A new hybrid PSOGSA algorithm for function optimization. In: IEEE international conference on computer and information application, pp 374–377

  27. Naik A, Satapathy SC (2014) Efficient clustering of dataset based on differential evolution. In: Satapathy SC, Udgata SK, Biswal BN (eds) Advances in intelligent systems and computing, Springer, pp 217–227

  28. Qin ZS (2006) Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 22:1988–1997

    Article  Google Scholar 

  29. Rashedi E, Pour HN, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179:2232–2248

    Article  MATH  Google Scholar 

  30. Rashedi E, Pour HN, Saryazdi S (2010) BGSA: binary gravitational search algorithm. Nat Comput 9:727–745

    Article  MathSciNet  MATH  Google Scholar 

  31. Roth RV, Lange T (2004) Feature selection in clustering problems. In: Proceedings of advances in neural information processing systems, Cambridge

  32. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  33. Saha S, Ekbal A, Gupta K, Bandyopadhyay S (2013) Gene expression data clustering using a multiobjective symmetry based clustering technique. Comput Biol Med 43:1965–1977

    Article  Google Scholar 

  34. Sarvari H, Khairdoost N, Fetanat A (2010) Harmony search algorithm for simultaneous clustering and feature selection. In: International conference of soft computing and pattern recognition, Paris, pp 202–207

  35. Sheng W, Liu X, Fairhurst M (2008) A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Trans Knowl Data Eng 20(7):868–879

    Article  Google Scholar 

  36. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14

    Article  Google Scholar 

  37. Swetha KP, Devi VS (2012) Simultaneous feature selection and clustering using particle swarm optimization. In: International conference on neural information processing, Doha, Qatar, pp 509–515

  38. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96:2907–2912

    Article  Google Scholar 

  39. Vaithyanathan S, Dom B (1999) Generalized model selection for unsupervised learning in high dimensions. In: Advances in neural information processing systems, 12, Cambridge, pp 970–976,

  40. Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R (1998) Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 95:334–339

    Article  Google Scholar 

  41. Zeng H, Cheung Y-M (2009) A new feature selection method for gaussian mixture clustering. Pattern Recogn 42:243–250

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Kumar.

Ethics declarations

Conflict of interest

The authors declare the absence of conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, V., Kumar, D. Automatic clustering and feature selection using gravitational search algorithm and its application to microarray data analysis. Neural Comput & Applic 31, 3647–3663 (2019). https://doi.org/10.1007/s00521-017-3321-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3321-0

Keywords

Navigation