Skip to main content
Log in

Classifier-independent feature selection on the basis of divergence criterion

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Feature selection aims to choose a feature subset that has the most discriminative information from the original feature set. In practical cases, it is preferable to select a feature subset that is universally effective for any kind of classifier because there is no underlying information about a given dataset. Such a trial is called classifier-independent feature selection. We took notice of Novovičová et al.’s study as a classifier-independent feature selection method. However, the number of features have to be selected beforehand in their method. It is more desirable to determine a feature subset size automatically so as to remove only garbage features. In this study, we propose a divergence criterion on the basis of Novovičová et al.’s method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recognit 35(4):835–846

    Article  MATH  Google Scholar 

  2. Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912

    Article  Google Scholar 

  3. Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437

    Article  Google Scholar 

  4. Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125

    Article  Google Scholar 

  5. Kira K, Rendell L (1992) A practical approach to feature selection. In: Sleeman D, Edwards P (eds) Proceedings of the 9th International Conference on Machine Learning, pp 249–256

  6. Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, De Raedt L (eds) Proceedings of the European Conference on Machine Learning, pp 171–182

  7. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324

    Article  MATH  Google Scholar 

  8. Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Maybury MT (ed) Proceedings of the 9th National Conference on Artificial Intelligence, pp 547–552

  9. Ferri FJ, Pudil P, Hatef M, Kittler J (1994) Comparative study of techniques for large-scale feature selection. In: Gelsema ES, Kanal LN (eds) Pattern Recognition in Practice, vol IV, Elsevier, pp 403–413

  10. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2):245–271

    Article  MATH  MathSciNet  Google Scholar 

  11. Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41

    Article  Google Scholar 

  12. Hong S (1997) Use of contextual information for feature ranking and discretization. IEEE Trans Knowl Data Engineering 9(5):718–730

    Article  Google Scholar 

  13. Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Langley P (ed) Proceedings of the 17th International Conference on Machine Learning, pp 259–266

  14. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    Google Scholar 

  15. Koller D, Sahami M (1996) Toward optimal feature selection. In: Saitta L (ed) Proceedings of the 13th International Conference on Machine Learning, pp 284–292

  16. Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  17. Singh S (2003) PRISM—a novel framework for pattern recognition. Pattern Anal Appl 6(2):131–149

    Article  Google Scholar 

  18. Singh S (2003) Multiresolution estimates of classification complexity. IEEE Trans Pattern Recognit Mach Intell 25(12):1534–1539

    Article  Google Scholar 

  19. Ho TK, Basu M (2000) Measuring the complexity of classification problems. In: Sanfeliu A, Villanueva JJ et al. (eds) Proceedings of the 15th International Conference on Pattern Recognition, vol 2, pp 43–47

  20. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300

    Article  Google Scholar 

  21. Novovičová J, Pudil P, Kittler J (1996) Divergence based feature selection for multimodal class densities. IEEE Trans Pattern Anal Mach Intell 18(2):218–223

    Article  Google Scholar 

  22. Kudo M, Shimbo M (1993) Feature selection based on the structural indices of categories. Pattern Recognit 26(6):891–901

    Article  Google Scholar 

  23. Holz HJ, Loew MH (1994) Relative feature importance: a classifier-independent approach to feature selection. In: Gelsema ES, Kanal LN (eds) Pattern Recognition in Practice, vol IV, Elsevier, pp 473–487

  24. Boekee DE, Van der Lubbe JCA (1979) Some aspects of error bounds in feature selection. Pattern Recognit 11(5-6):353–360

    Article  MATH  Google Scholar 

  25. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  26. Ichimura N (1995) Robust clustering based on a maximum likelihood method for estimation of the suitable number of clusters. Transactions of the Institute of Electronics Information and Communication Engineers J78-D-II(8):1184–1195 (in Japanese)

  27. Kudo M, Sklansky J (1998) Classifier-independent feature selection for two-stage feature selection. In: Amin A, Dori D, Pudil P, Freeman H (eds) Proceedings of the Joint IAPR International Workshops on SSPR’98 and SPR’98, pp 548–554

  28. Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280

    Article  MATH  MathSciNet  Google Scholar 

  29. Friedman JH (1977) A recursive partitioning decision rule for nonparametric classification. IEEE Trans Comput 26:404–408

    MATH  Google Scholar 

  30. Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI Repository of Machine Learning Databases, Department of Information and Computer Science. Irvine, University of California. http://www.ics.uci.edu/ mlearn/MLRepository.html

  31. Kudo M, Yanagi S, Shimbo M (1996) Construction of class regions by a randomized algorithm: a randomized subclass method. Pattern Recognit 29(4):581–588

    Article  Google Scholar 

  32. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York

    MATH  Google Scholar 

  33. Quinlan JR (1993) C4.5:programs for machine learning, Morgan Kaufmann

  34. Collobert R, Bengio S (2001) SVMTorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers who gave us helpful comments to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoto Abe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abe, N., Kudo, M., Toyama, J. et al. Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Applic 9, 127–137 (2006). https://doi.org/10.1007/s10044-006-0030-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-006-0030-1

Keywords

Navigation