Abstract
Feature selection aims to choose a feature subset that has the most discriminative information from the original feature set. In practical cases, it is preferable to select a feature subset that is universally effective for any kind of classifier because there is no underlying information about a given dataset. Such a trial is called classifier-independent feature selection. We took notice of Novovičová et al.’s study as a classifier-independent feature selection method. However, the number of features have to be selected beforehand in their method. It is more desirable to determine a feature subset size automatically so as to remove only garbage features. In this study, we propose a divergence criterion on the basis of Novovičová et al.’s method.
Similar content being viewed by others
References
Sebban M, Nock R (2002) A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recognit 35(4):835–846
Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Kira K, Rendell L (1992) A practical approach to feature selection. In: Sleeman D, Edwards P (eds) Proceedings of the 9th International Conference on Machine Learning, pp 249–256
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, De Raedt L (eds) Proceedings of the European Conference on Machine Learning, pp 171–182
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Maybury MT (ed) Proceedings of the 9th National Conference on Artificial Intelligence, pp 547–552
Ferri FJ, Pudil P, Hatef M, Kittler J (1994) Comparative study of techniques for large-scale feature selection. In: Gelsema ES, Kanal LN (eds) Pattern Recognition in Practice, vol IV, Elsevier, pp 403–413
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2):245–271
Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41
Hong S (1997) Use of contextual information for feature ranking and discretization. IEEE Trans Knowl Data Engineering 9(5):718–730
Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Langley P (ed) Proceedings of the 17th International Conference on Machine Learning, pp 259–266
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Koller D, Sahami M (1996) Toward optimal feature selection. In: Saitta L (ed) Proceedings of the 13th International Conference on Machine Learning, pp 284–292
Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671
Singh S (2003) PRISM—a novel framework for pattern recognition. Pattern Anal Appl 6(2):131–149
Singh S (2003) Multiresolution estimates of classification complexity. IEEE Trans Pattern Recognit Mach Intell 25(12):1534–1539
Ho TK, Basu M (2000) Measuring the complexity of classification problems. In: Sanfeliu A, Villanueva JJ et al. (eds) Proceedings of the 15th International Conference on Pattern Recognition, vol 2, pp 43–47
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Novovičová J, Pudil P, Kittler J (1996) Divergence based feature selection for multimodal class densities. IEEE Trans Pattern Anal Mach Intell 18(2):218–223
Kudo M, Shimbo M (1993) Feature selection based on the structural indices of categories. Pattern Recognit 26(6):891–901
Holz HJ, Loew MH (1994) Relative feature importance: a classifier-independent approach to feature selection. In: Gelsema ES, Kanal LN (eds) Pattern Recognition in Practice, vol IV, Elsevier, pp 473–487
Boekee DE, Van der Lubbe JCA (1979) Some aspects of error bounds in feature selection. Pattern Recognit 11(5-6):353–360
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38
Ichimura N (1995) Robust clustering based on a maximum likelihood method for estimation of the suitable number of clusters. Transactions of the Institute of Electronics Information and Communication Engineers J78-D-II(8):1184–1195 (in Japanese)
Kudo M, Sklansky J (1998) Classifier-independent feature selection for two-stage feature selection. In: Amin A, Dori D, Pudil P, Freeman H (eds) Proceedings of the Joint IAPR International Workshops on SSPR’98 and SPR’98, pp 548–554
Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280
Friedman JH (1977) A recursive partitioning decision rule for nonparametric classification. IEEE Trans Comput 26:404–408
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI Repository of Machine Learning Databases, Department of Information and Computer Science. Irvine, University of California. http://www.ics.uci.edu/ mlearn/MLRepository.html
Kudo M, Yanagi S, Shimbo M (1996) Construction of class regions by a randomized algorithm: a randomized subclass method. Pattern Recognit 29(4):581–588
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin Heidelberg New York
Quinlan JR (1993) C4.5:programs for machine learning, Morgan Kaufmann
Collobert R, Bengio S (2001) SVMTorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160
Acknowledgments
The authors would like to thank the anonymous reviewers who gave us helpful comments to improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abe, N., Kudo, M., Toyama, J. et al. Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Applic 9, 127–137 (2006). https://doi.org/10.1007/s10044-006-0030-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-006-0030-1