Skip to main content

Is Feature Selection Still Necessary?

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3940))

Abstract

Feature selection is usually motivated by improved computational complexity, economy and problem understanding, but it can also improve classification accuracy in many cases. In this paper we investigate the relationship between the optimal number of features and the training set size. We present a new and simple analysis of the well-studied two-Gaussian setting. We explicitly find the optimal number of features as a function of the training set size for a few special cases and show that accuracy declines dramatically by adding too many features. Then we show empirically that Support Vector Machine (SVM), that was designed to work in the presence of a large number of features produces the same qualitative result for these examples. This suggests that good feature selection is still an important component in accurate classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Quinlan, J.R.: Induction of decision trees. In: Shavlik, J.W., Dietterich, T.G. (eds.) Readings in Machine Learning. Morgan Kaufmann, San Francisco (1990), Originally published in Machine Learning 1, 81–106 (1986)

    Google Scholar 

  2. Kira, K., Rendell, L.: A practical approach to feature selection. In: Proc. 9th International Workshop on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  3. Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Inte lligence (AAAI 1991), Anaheim, California, vol. 2, pp. 547–552. AAAI Press, Menlo Park (1991)

    Google Scholar 

  4. Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)

    Google Scholar 

  5. Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proc. 21st International Conference on Machine Learning (ICML), pp. 337–344 (2004)

    Google Scholar 

  6. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learnig Research, 1157–1182 (March 2003)

    Google Scholar 

  7. Jain, A.K., Waller, W.G.: On the optimal number of features in the classification of multivariate gaussian data. Pattern Recognition 10, 365–374 (1978)

    Article  MATH  Google Scholar 

  8. Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Transactions on pattern analysis and machine intelligence PAMI-1(3), 306–307 (1979)

    Article  Google Scholar 

  9. Raudys, S., Pikelis, V.: On dimensionality, sample size, classification error and complexity of classification algorithm in pattern recognition. IEEE Transactions on pattern analysis and machine intelligence PAMI-2(3), 242–252 (1980)

    Article  MATH  Google Scholar 

  10. Hua, J., Xiong, Z., Dougherty, E.R.: Determination of the optimal number of features for quadratic discriminant analysis via normal approximation to the discriminant distribution. Pattern Recognition 38(3), 403–421 (2005)

    Article  MATH  Google Scholar 

  11. Jain, A.K., Waller, W.G.: On the monotonicity of the performance of bayesian classifiers. IEEE transactions on Information Theory 24(3), 392–394 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  12. Boser, B., Guyon, I., Vapnik, V.: Optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  13. Anderson, T.W.: Classification by multivariate analysis. Psychometria 16, 31–50 (1951)

    Article  MathSciNet  Google Scholar 

  14. Cawley, G.C.: MATLAB support vector machine toolbox (v0.55β), University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K. NR4 7TJ (2000), http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox

  15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navot, A., Gilad-Bachrach, R., Navot, Y., Tishby, N. (2006). Is Feature Selection Still Necessary?. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds) Subspace, Latent Structure and Feature Selection. SLSFS 2005. Lecture Notes in Computer Science, vol 3940. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752790_8

Download citation

  • DOI: https://doi.org/10.1007/11752790_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34137-6

  • Online ISBN: 978-3-540-34138-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics