Skip to main content

Embedded Methods

  • Chapter
Feature Extraction

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

Abstract

Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on how they solve the feature selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • E. Amaldi and V. Kann. On the Approximability of Minimizing non zero Variables or Unsatisfied Relations in Linear Systems. Theoretical Computer Science, 209: 237–260, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  • J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research, 3:1229–1243, 2003.

    Article  MATH  Google Scholar 

  • O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, École Polytechnique, 2002.

    Google Scholar 

  • P. S. Bradley and O. L. Mangasarian. Feature Selection via Concave Minimization and Support Vector Machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998.

    Google Scholar 

  • L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.

    Google Scholar 

  • O. Chapelle. Support Vector Machines: Induction Principles, Adaptive Tuning and Prior Knowledge. PhD thesis, LIP6, Paris, 2002.

    Google Scholar 

  • O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, 46(1–3):131–159, 2002.

    Article  MATH  Google Scholar 

  • S. Chen, S.A. Billings, and W. Luo. Orthogonal Least Squares and Their Application to Non-linear System Identification. International Journal of Control, 50:1873–1896, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  • T. Cover and J. Thomas. Elements of Information Theory. Wiley and Sons, USA, 1991.

    MATH  Google Scholar 

  • R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. John Wiley and Sons, New York, USA, second edition, 2001.

    MATH  Google Scholar 

  • B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  • G. Fung and O. L. Mangasarian. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, pages 1–18, 2003.

    Google Scholar 

  • C. Gentile. Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms. In Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.

    Google Scholar 

  • Y. Grandvalet and S. Canu. Adaptive Scaling for Feature Selection in SVMs. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15, Cambridge, MA, USA, 2003. MIT Press.

    Google Scholar 

  • A. J. Grove, N. Littlestone, and D. Schuurmans. General Convergence Results for Linear Discriminant Updates. In Computational Learing Theory, pages 171–183, 1997.

    Google Scholar 

  • I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46:389–422, January 2002.

    Article  MATH  Google Scholar 

  • S. Hochreiter and K. Obermayer. Gene Selection for Microarray Data. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, Massachusetts, 2004.

    Google Scholar 

  • T. Jaakkola, M. Meila, and T. Jebara. Maximum Entropy Discrimination. Technical Report AITR-1668, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, 1999.

    Google Scholar 

  • T. Jebara. Multi-Task Feature and Kernel Selection For SVMs. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.

    Google Scholar 

  • T. Jebara and T. Jaakkola. Feature Selection and Dualities in Maximum Entropy Discrimination. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000.

    Google Scholar 

  • J. Kivinen and M. Warmuth. The Perceptron Algorithm vs. Winnow: Linear vs. Logarithmic Mistake Bounds when few Input Variables are Relevant. In Proceedings of the eighth annual conference on Computational learning theory, pages 289–296, New York, USA, 1995. ACM Press.

    Google Scholar 

  • B. Krishnapuram, L. Carin, and A. Hartemink. Gene Expression Analysis: Joint Feature Selection and Classifier Design. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.

    Google Scholar 

  • T.N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, and B. Schölkopf. Support Vector Channel Selection in BCI. IEEE Transactions on Biomedical Engineering. Special Issue on Brain-Computer Interfaces, 51(6): 1003–1010, June 2004.

    Google Scholar 

  • Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal Brain Damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems II, San Mateo, CA, 1990. Morgan Kauffman.

    Google Scholar 

  • P. Leray and P. Gallinari. Feature Selection with Neural Networks. Behaviormetrika, 26(1), 1999.

    Google Scholar 

  • Y. Li, C. Campbell, and M. Tipping. Bayesian Automatic Relevance Determination Algorithms for Classifying Gene Expression Data. Bioinformatics, 18(10):1332–1339, 2002.

    Article  Google Scholar 

  • A. Luntz and V. Brailovsky. On the Estimation of Characters Obtained in Statistical Procedure of Recognition. Technicheskaya Kibernetica, 1996.

    Google Scholar 

  • D. J. C. MacKay. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions, 100(2):1053–1062, 1994.

    Google Scholar 

  • O. Mangasarian. Multisurface method of pattern separation. IEEE Transactions on Information Theory, 14(6):801–807, 1968.

    Article  MATH  Google Scholar 

  • S. Mika, G. Rätsch, and K.-R. Müller. A Mathematical Programming Approach to the Kernel Fisher Algorithm. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, pages 591–597, Cambridge, MA, USA, 2000. MIT Press.

    Google Scholar 

  • R. Neal. Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics. Springer, 1996.

    Google Scholar 

  • D. Peleg and R. Meir. A feature selection algorithm based on the global minimization of a generalization error bound. In NIPS 18, 2004.

    Google Scholar 

  • S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space. Journal of Machine Learning Research, 3:1333–1356, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  • J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  • J.R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.

    Google Scholar 

  • A. Rakotomamonjy. Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research, 3:1357–1370, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  • I. Rivals and L. Personnaz. MLPs (Mono-Layer Polynomials and Multi-Layer Perceptrons) for Nonlinear Modeling. Journal of Machine Learning Research, 3: 1383–1398, 2003.

    Article  MATH  Google Scholar 

  • V. Roth. The Generalized LASSO. IEEE Transactions on Neural Networks, 2003.

    Google Scholar 

  • M. Seeger. Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, Cambridge, MA, USA, 2000. MIT Press.

    Google Scholar 

  • Spider. Machine Learning Toolbox http://www.kyb.tuebingen.mpg.de/bs/people/spider/, 2004.

  • H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a Random Feature for Variable and Feature Selection. Journal of Machine Learning Research, 3: 1399–1414, 2003.

    Article  MATH  Google Scholar 

  • R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B(Methodological), 58(1):267–288, 1996.

    MATH  MathSciNet  Google Scholar 

  • M. E. Tipping. Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1:211–244, 2001.

    Article  MATH  MathSciNet  Google Scholar 

  • V. Vapnik and O. Chapelle. Bounds on Error Expectation for Support Vector Machines. Neural Computation, 12(9), 2000.

    Google Scholar 

  • V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, USA, 1998.

    MATH  Google Scholar 

  • J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 526–532, Cambridge, MA, USA, 2000. MIT Press.

    Google Scholar 

  • J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping. Use of the Zero-Norm with Linear Models and Kernel Methods. Journal of Machine Learning Research, 3: 1439–1461, March 2003.

    Google Scholar 

  • C. Williams and D. Barber. Bayesian Classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(20), 1998.

    Google Scholar 

  • H. Wold. Soft modeling by latent variables; the nonlinear iterative partial least squares approach. In J. Gani, editor, Perspectives in Probability and Statistics, Papers in Honours of M.S. Bartlett, London, 1975. Academic Press.

    Google Scholar 

  • J. Zhu and T. Hastie. Classification of Gene Microarrays by Penalized Logistic Regression. Biostatistics, 5(3):427–443, 2003.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lal, T.N., Chapelle, O., Weston, J., Elisseeff, A. (2006). Embedded Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics