Abstract
Feature selection is usually motivated by improved computational complexity, economy and problem understanding, but it can also improve classification accuracy in many cases. In this paper we investigate the relationship between the optimal number of features and the training set size. We present a new and simple analysis of the well-studied two-Gaussian setting. We explicitly find the optimal number of features as a function of the training set size for a few special cases and show that accuracy declines dramatically by adding too many features. Then we show empirically that Support Vector Machine (SVM), that was designed to work in the presence of a large number of features produces the same qualitative result for these examples. This suggests that good feature selection is still an important component in accurate classification.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, San Francisco (1990), Originally published in Machine Learning 1, 81–106 (1986)
Kira, K., Rendell, L.: A practical approach to feature selection. In: Proc. 9th International Workshop on Machine Learning, pp. 249–256 (1992)
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Inte lligence (AAAI 1991), Anaheim, California, vol. 2, pp. 547–552. AAAI Press, Menlo Park (1991)
Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)
Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proc. 21st International Conference on Machine Learning (ICML), pp. 337–344 (2004)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learnig Research, 1157–1182 (March 2003)
Jain, A.K., Waller, W.G.: On the optimal number of features in the classification of multivariate gaussian data. Pattern Recognition 10, 365–374 (1978)
Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Transactions on pattern analysis and machine intelligence PAMI-1(3), 306–307 (1979)
Raudys, S., Pikelis, V.: On dimensionality, sample size, classification error and complexity of classification algorithm in pattern recognition. IEEE Transactions on pattern analysis and machine intelligence PAMI-2(3), 242–252 (1980)
Hua, J., Xiong, Z., Dougherty, E.R.: Determination of the optimal number of features for quadratic discriminant analysis via normal approximation to the discriminant distribution. Pattern Recognition 38(3), 403–421 (2005)
Jain, A.K., Waller, W.G.: On the monotonicity of the performance of bayesian classifiers. IEEE transactions on Information Theory 24(3), 392–394 (1978)
Boser, B., Guyon, I., Vapnik, V.: Optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Anderson, T.W.: Classification by multivariate analysis. Psychometria 16, 31–50 (1951)
Cawley, G.C.: MATLAB support vector machine toolbox (v0.55β), University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K. NR4 7TJ (2000), http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Navot, A., Gilad-Bachrach, R., Navot, Y., Tishby, N. (2006). Is Feature Selection Still Necessary?. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_8
Download citation
DOI: https://doi.org/10.1007/11752790_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34137-6
Online ISBN: 978-3-540-34138-3
eBook Packages: Computer ScienceComputer Science (R0)