Abstract
Although many embedded feature selection methods have been introduced during the last few years, a unifying theoretical framework has not been developed to date. We start this chapter by defining such a framework which we think is general enough to cover many embedded methods. We will then discuss embedded methods based on how they solve the feature selection problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Amaldi and V. Kann. On the Approximability of Minimizing non zero Variables or Unsatisfied Relations in Linear Systems. Theoretical Computer Science, 209: 237–260, 1998.
J. Bi, K. Bennett, M. Embrechts, C. Breneman, and M. Song. Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research, 3:1229–1243, 2003.
O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. PhD thesis, École Polytechnique, 2002.
P. S. Bradley and O. L. Mangasarian. Feature Selection via Concave Minimization and Support Vector Machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998.
L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth and Brooks, 1984.
O. Chapelle. Support Vector Machines: Induction Principles, Adaptive Tuning and Prior Knowledge. PhD thesis, LIP6, Paris, 2002.
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing Multiple Parameters for Support Vector Machines. Machine Learning, 46(1–3):131–159, 2002.
S. Chen, S.A. Billings, and W. Luo. Orthogonal Least Squares and Their Application to Non-linear System Identification. International Journal of Control, 50:1873–1896, 1989.
T. Cover and J. Thomas. Elements of Information Theory. Wiley and Sons, USA, 1991.
R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. John Wiley and Sons, New York, USA, second edition, 2001.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004.
G. Fung and O. L. Mangasarian. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Aplications, pages 1–18, 2003.
C. Gentile. Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms. In Sebastian Thrun, Lawrence Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
Y. Grandvalet and S. Canu. Adaptive Scaling for Feature Selection in SVMs. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15, Cambridge, MA, USA, 2003. MIT Press.
A. J. Grove, N. Littlestone, and D. Schuurmans. General Convergence Results for Linear Discriminant Updates. In Computational Learing Theory, pages 171–183, 1997.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46:389–422, January 2002.
S. Hochreiter and K. Obermayer. Gene Selection for Microarray Data. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, Massachusetts, 2004.
T. Jaakkola, M. Meila, and T. Jebara. Maximum Entropy Discrimination. Technical Report AITR-1668, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, 1999.
T. Jebara. Multi-Task Feature and Kernel Selection For SVMs. In Proceedings of the 21st International Conference on Machine Learning (ICML), 2004.
T. Jebara and T. Jaakkola. Feature Selection and Dualities in Maximum Entropy Discrimination. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 2000.
J. Kivinen and M. Warmuth. The Perceptron Algorithm vs. Winnow: Linear vs. Logarithmic Mistake Bounds when few Input Variables are Relevant. In Proceedings of the eighth annual conference on Computational learning theory, pages 289–296, New York, USA, 1995. ACM Press.
B. Krishnapuram, L. Carin, and A. Hartemink. Gene Expression Analysis: Joint Feature Selection and Classifier Design. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.
T.N. Lal, M. Schröder, T. Hinterberger, J. Weston, M. Bogdan, N. Birbaumer, and B. Schölkopf. Support Vector Channel Selection in BCI. IEEE Transactions on Biomedical Engineering. Special Issue on Brain-Computer Interfaces, 51(6): 1003–1010, June 2004.
Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal Brain Damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems II, San Mateo, CA, 1990. Morgan Kauffman.
P. Leray and P. Gallinari. Feature Selection with Neural Networks. Behaviormetrika, 26(1), 1999.
Y. Li, C. Campbell, and M. Tipping. Bayesian Automatic Relevance Determination Algorithms for Classifying Gene Expression Data. Bioinformatics, 18(10):1332–1339, 2002.
A. Luntz and V. Brailovsky. On the Estimation of Characters Obtained in Statistical Procedure of Recognition. Technicheskaya Kibernetica, 1996.
D. J. C. MacKay. Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions, 100(2):1053–1062, 1994.
O. Mangasarian. Multisurface method of pattern separation. IEEE Transactions on Information Theory, 14(6):801–807, 1968.
S. Mika, G. Rätsch, and K.-R. Müller. A Mathematical Programming Approach to the Kernel Fisher Algorithm. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, pages 591–597, Cambridge, MA, USA, 2000. MIT Press.
R. Neal. Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics. Springer, 1996.
D. Peleg and R. Meir. A feature selection algorithm based on the global minimization of a generalization error bound. In NIPS 18, 2004.
S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space. Journal of Machine Learning Research, 3:1333–1356, 2003.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
J.R. Quinlan. Induction of Decision Trees. Machine Learning, 1(1):81–106, 1986.
A. Rakotomamonjy. Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research, 3:1357–1370, 2003.
I. Rivals and L. Personnaz. MLPs (Mono-Layer Polynomials and Multi-Layer Perceptrons) for Nonlinear Modeling. Journal of Machine Learning Research, 3: 1383–1398, 2003.
V. Roth. The Generalized LASSO. IEEE Transactions on Neural Networks, 2003.
M. Seeger. Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, Cambridge, MA, USA, 2000. MIT Press.
Spider. Machine Learning Toolbox http://www.kyb.tuebingen.mpg.de/bs/people/spider/, 2004.
H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a Random Feature for Variable and Feature Selection. Journal of Machine Learning Research, 3: 1399–1414, 2003.
R. Tibshirani. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B(Methodological), 58(1):267–288, 1996.
M. E. Tipping. Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research, 1:211–244, 2001.
V. Vapnik and O. Chapelle. Bounds on Error Expectation for Support Vector Machines. Neural Computation, 12(9), 2000.
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, USA, 1998.
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In S.A. Solla, T.K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems, volume 12, pages 526–532, Cambridge, MA, USA, 2000. MIT Press.
J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping. Use of the Zero-Norm with Linear Models and Kernel Methods. Journal of Machine Learning Research, 3: 1439–1461, March 2003.
C. Williams and D. Barber. Bayesian Classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(20), 1998.
H. Wold. Soft modeling by latent variables; the nonlinear iterative partial least squares approach. In J. Gani, editor, Perspectives in Probability and Statistics, Papers in Honours of M.S. Bartlett, London, 1975. Academic Press.
J. Zhu and T. Hastie. Classification of Gene Microarrays by Penalized Logistic Regression. Biostatistics, 5(3):427–443, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lal, T.N., Chapelle, O., Weston, J., Elisseeff, A. (2006). Embedded Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)