Embedded Methods

Lal, Thomas Navin; Chapelle, Olivier; Weston, Jason; Elisseeff, André

doi:10.1007/978-3-540-35488-8_6

Thomas Navin Lal⁶,
Olivier Chapelle⁶,
Jason Weston⁷ &
…
André Elisseeff⁸

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

9647 Accesses
107 Citations

Abstract

Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on how they solve the feature selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Amaldi and V. Kann. On the Approximability of Minimizing non zero Variables or Unsatisfied Relations in Linear Systems. Theoretical Computer Science, 209: 237–260, 1998.
Article MATH MathSciNet Google Scholar
J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research, 3:1229–1243, 2003.
Article MATH Google Scholar
O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, École Polytechnique, 2002.
Google Scholar
P. S. Bradley and O. L. Mangasarian. Feature Selection via Concave Minimization and Support Vector Machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.
Google Scholar
O. Chapelle. Support Vector Machines: Induction Principles, Adaptive Tuning and Prior Knowledge. PhD thesis, LIP6, Paris, 2002.
Google Scholar
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, 46(1–3):131–159, 2002.
Article MATH Google Scholar
S. Chen, S.A. Billings, and W. Luo. Orthogonal Least Squares and Their Application to Non-linear System Identification. International Journal of Control, 50:1873–1896, 1989.
Article MATH MathSciNet Google Scholar
T. Cover and J. Thomas. Elements of Information Theory. Wiley and Sons, USA, 1991.
MATH Google Scholar
R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. John Wiley and Sons, New York, USA, second edition, 2001.
MATH Google Scholar
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.
Article MATH MathSciNet Google Scholar
G. Fung and O. L. Mangasarian. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, pages 1–18, 2003.
Google Scholar
C. Gentile. Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms. In Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
Google Scholar
Y. Grandvalet and S. Canu. Adaptive Scaling for Feature Selection in SVMs. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15, Cambridge, MA, USA, 2003. MIT Press.
Google Scholar
A. J. Grove, N. Littlestone, and D. Schuurmans. General Convergence Results for Linear Discriminant Updates. In Computational Learing Theory, pages 171–183, 1997.
Google Scholar
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46:389–422, January 2002.
Article MATH Google Scholar
S. Hochreiter and K. Obermayer. Gene Selection for Microarray Data. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, Massachusetts, 2004.
Google Scholar
T. Jaakkola, M. Meila, and T. Jebara. Maximum Entropy Discrimination. Technical Report AITR-1668, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, 1999.
Google Scholar
T. Jebara. Multi-Task Feature and Kernel Selection For SVMs. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.
Google Scholar
T. Jebara and T. Jaakkola. Feature Selection and Dualities in Maximum Entropy Discrimination. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000.
Google Scholar
J. Kivinen and M. Warmuth. The Perceptron Algorithm vs. Winnow: Linear vs. Logarithmic Mistake Bounds when few Input Variables are Relevant. In Proceedings of the eighth annual conference on Computational learning theory, pages 289–296, New York, USA, 1995. ACM Press.
Google Scholar
B. Krishnapuram, L. Carin, and A. Hartemink. Gene Expression Analysis: Joint Feature Selection and Classifier Design. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.
Google Scholar
T.N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, and B. Schölkopf. Support Vector Channel Selection in BCI. IEEE Transactions on Biomedical Engineering. Special Issue on Brain-Computer Interfaces, 51(6): 1003–1010, June 2004.
Google Scholar
Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal Brain Damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems II, San Mateo, CA, 1990. Morgan Kauffman.
Google Scholar
P. Leray and P. Gallinari. Feature Selection with Neural Networks. Behaviormetrika, 26(1), 1999.
Google Scholar
Y. Li, C. Campbell, and M. Tipping. Bayesian Automatic Relevance Determination Algorithms for Classifying Gene Expression Data. Bioinformatics, 18(10):1332–1339, 2002.
Article Google Scholar
A. Luntz and V. Brailovsky. On the Estimation of Characters Obtained in Statistical Procedure of Recognition. Technicheskaya Kibernetica, 1996.
Google Scholar
D. J. C. MacKay. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions, 100(2):1053–1062, 1994.
Google Scholar
O. Mangasarian. Multisurface method of pattern separation. IEEE Transactions on Information Theory, 14(6):801–807, 1968.
Article MATH Google Scholar
S. Mika, G. Rätsch, and K.-R. Müller. A Mathematical Programming Approach to the Kernel Fisher Algorithm. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, pages 591–597, Cambridge, MA, USA, 2000. MIT Press.
Google Scholar
R. Neal. Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics. Springer, 1996.
Google Scholar
D. Peleg and R. Meir. A feature selection algorithm based on the global minimization of a generalization error bound. In NIPS 18, 2004.
Google Scholar
S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space. Journal of Machine Learning Research, 3:1333–1356, 2003.
Article MATH MathSciNet Google Scholar
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Google Scholar
J.R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.
Google Scholar
A. Rakotomamonjy. Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research, 3:1357–1370, 2003.
Article MATH MathSciNet Google Scholar
I. Rivals and L. Personnaz. MLPs (Mono-Layer Polynomials and Multi-Layer Perceptrons) for Nonlinear Modeling. Journal of Machine Learning Research, 3: 1383–1398, 2003.
Article MATH Google Scholar
V. Roth. The Generalized LASSO. IEEE Transactions on Neural Networks, 2003.
Google Scholar
M. Seeger. Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, Cambridge, MA, USA, 2000. MIT Press.
Google Scholar
Spider. Machine Learning Toolbox http://www.kyb.tuebingen.mpg.de/bs/people/spider/, 2004.
H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a Random Feature for Variable and Feature Selection. Journal of Machine Learning Research, 3: 1399–1414, 2003.
Article MATH Google Scholar
R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B(Methodological), 58(1):267–288, 1996.
MATH MathSciNet Google Scholar
M. E. Tipping. Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1:211–244, 2001.
Article MATH MathSciNet Google Scholar
V. Vapnik and O. Chapelle. Bounds on Error Expectation for Support Vector Machines. Neural Computation, 12(9), 2000.
Google Scholar
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, USA, 1998.
MATH Google Scholar
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 526–532, Cambridge, MA, USA, 2000. MIT Press.
Google Scholar
J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping. Use of the Zero-Norm with Linear Models and Kernel Methods. Journal of Machine Learning Research, 3: 1439–1461, March 2003.
Google Scholar
C. Williams and D. Barber. Bayesian Classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(20), 1998.
Google Scholar
H. Wold. Soft modeling by latent variables; the nonlinear iterative partial least squares approach. In J. Gani, editor, Perspectives in Probability and Statistics, Papers in Honours of M.S. Bartlett, London, 1975. Academic Press.
Google Scholar
J. Zhu and T. Hastie. Classification of Gene Microarrays by Penalized Logistic Regression. Biostatistics, 5(3):427–443, 2003.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Thomas Navin Lal & Olivier Chapelle
NEC Research, Princeton, USA
Jason Weston
IBM Research, Zürich, Switzerland
André Elisseeff

Authors

Thomas Navin Lal
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Chapelle
View author publications
You can also search for this author in PubMed Google Scholar
Jason Weston
View author publications
You can also search for this author in PubMed Google Scholar
André Elisseeff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Clopinet, 955 Creston Road, 94708, Berkeley, USA
Isabelle Guyon
Department of Electrical Engineering & Computer Science — EECS, University of California, 94720, Berkeley, USA
Masoud Nikravesh
School of Electronics and Computer Sciences, University of Southampton, SO17 1BJ, Southampton Highfield, UK
Steve Gunn
Division of Computer Science Lab. Electronics Research, University of California, Soda Hall 387, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lal, T.N., Chapelle, O., Weston, J., Elisseeff, A. (2006). Embedded Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-35488-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics