Skip to main content
Log in

A new feature subset selection using bottom-up clustering

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Feature subset selection and/or dimensionality reduction is an essential preprocess before performing any data mining task, especially when there are too many features in the problem space. In this paper, a clustering-based feature subset selection (CFSS) algorithm is proposed for discriminating more relevant features. In each level of agglomeration, it uses similarity measure among features to merge two most similar clusters of features. By gathering similar features into clusters and then introducing representative features of each cluster, it tries to remove some redundant features. To identify the representative features, a criterion based on mutual information is proposed. Since CFSS works in a filter manner in specifying the representatives, it is noticeably fast. As an advantage of hierarchical clustering, it does not need to determine the number of clusters in advance. In CFSS, the clustering process is repeated until all features are distributed in some clusters. However, to diffuse the features in a reasonable number of clusters, a recently proposed approach is used to obtain a suitable level for cutting the clustering tree. To assess the performance of CFSS, we have applied it on some valid UCI datasets and compared with some popular feature selection methods. The experimental results reveal the efficiency and fastness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  2. Kohavi R, John GH (1997) Wrapper for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  3. Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15:1119–1125

    Article  Google Scholar 

  4. Reunanen J (2003) Overfitting in making comparisons between variable selection methods. J Mach Learn Res 3:1371–1382

    MATH  Google Scholar 

  5. Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, Reading

    MATH  Google Scholar 

  6. Kennedy J, Eberhart RC (1995) Particle swarm optimization. IEEE Int Conf Neural Netw 4:942–1948

    Google Scholar 

  7. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  8. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundance. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  9. Dubes R, Jain AK (1980) Clustering methodologies in exploratory data analysis. In: Yovits MC (ed) Advances in computers. Academic Press Inc., New York, pp 113–125

    Google Scholar 

  10. Kasim S, Deris S, Othman RM (2013) Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data. Comput Biol Med 43:1120–1133

    Article  Google Scholar 

  11. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297

  12. Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, New York, pp 321–352

    Chapter  Google Scholar 

  13. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  14. Rafsanjani MK, Varzaneh ZA, Chukanlo NE (2012) A survey of hierarchical clustering algorithms. J Math Comput Sci 5(3):229–240

    Google Scholar 

  15. Yu-chieh WU (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225

    Article  Google Scholar 

  16. Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Article  Google Scholar 

  17. Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information based distances. Pattern Recogn 43(6):325–343

    MATH  Google Scholar 

  18. Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14

    Article  Google Scholar 

  19. Altman NS (1992) An introduction to kernel and nearest neighbor nonparametric regression. Am Stat 46(3):175–185

    MathSciNet  Google Scholar 

  20. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    Article  MathSciNet  Google Scholar 

  21. Song Y, Jin S, Shen J (2011) A unique property of single-link distance and its application in data clustering. Data Knowl Eng 70:984–1003

    Article  Google Scholar 

  22. Mansoori EG (2014) GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data. Soft Comput 18(5):905–922

    Article  Google Scholar 

  23. Khedkar SA, Bainwad AM, Chitnis PO (2014) A survey on clustered feature selection algorithms for high dimensional data. Int J Comput Sci Inf Technol (IJCSIT) 5(3):3274–3280

    Google Scholar 

  24. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York

    Book  MATH  Google Scholar 

  25. Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J (Br Comput Soc) 16(1):30–34

    MathSciNet  Google Scholar 

  26. Defays D (1977) An efficient algorithm for a complete link method. Comput J (Br Comput Soc) 20(4):364–366

    MathSciNet  MATH  Google Scholar 

  27. Mansoori EG (2013) Using statistical measures for feature ranking. Int J Pattern Recognit Artif Intell 27(1):1–14

    Article  MathSciNet  Google Scholar 

  28. Asuncion A, Newman DJ (2007) UCI machine learning repository. Department of Information and Computer science, University of California, Irvine, CA, online available: http://www.ics.uci.edu/mlearn/MLRepository.html

    Google Scholar 

  29. McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, New York

    Book  MATH  Google Scholar 

  30. Raskutti B, Leckie C (1999) An evaluation of criteria for measuring the quality of clusters. In: Proceedings of the international joint conference of artificial intelligence, pp 905–910

  31. Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Machine learning proceedings of the fourteenth international conference (ICML), pp 296–304

  32. Jitkrittum W, Hachiya H, Sugiyama M (2013) Feature selection via L1-penalized squared loss mutual information. IEICE Trans Inf Syst 96(7):1513–1524

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eghbal G. Mansoori.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dehghan, Z., Mansoori, E.G. A new feature subset selection using bottom-up clustering. Pattern Anal Applic 21, 57–66 (2018). https://doi.org/10.1007/s10044-016-0565-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0565-8

Keywords

Navigation