Skip to main content

Supervised Learning by Support Vector Machines

  • Reference work entry
  • 5909 Accesses

Abstract

During the last two decades, support vector machine learning has become a very active field of research with a large amount of both sophisticated theoretical results and exciting real-world applications. This paper gives a brief introduction into the basic concepts of supervised support vector learning and touches some recent developments in this broad field.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   1,200.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aizerman, M., Braverman, E., Rozonoer, L.: Uncovering shared structures in multiclass classification. In: International Conference on Machine Learning, pp. 821–837 (1964)

    Google Scholar 

  2. Amit, Y., Fink, M., Srebro, N., Ullman, S.: Theoretocal foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25, 17–24 (2007)

    Google Scholar 

  3. Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)

    MATH  Google Scholar 

  4. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Google Scholar 

  5. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)

    MATH  MathSciNet  Google Scholar 

  6. Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J Am. Stat. Assoc. 101, 138–156 (2006)

    MATH  MathSciNet  Google Scholar 

  7. Bennett, K.P., Mangasarian, O.L.: Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1, 23–34 (1992)

    Google Scholar 

  8. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Dordrecht (2004)

    MATH  Google Scholar 

  9. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  10. Björck, A.: Least Squares Problems. SIAM, Philadelphia (1996)

    MATH  Google Scholar 

  11. Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)

    MATH  Google Scholar 

  12. Boser, G.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, Madison, pp. 144–152 (1992)

    Google Scholar 

  13. Bottou, L., Chapelle, L., DeCoste, O., Weston, J. (eds.): Large Scale Kernel Machines. MIT, Cambridge (2007)

    Google Scholar 

  14. Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey on some recent advances. ESAIM Probab. Stat. 9, 323–375 (2005)

    MATH  MathSciNet  Google Scholar 

  15. Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 196–202. MIT, Cambridge (2001)

    Google Scholar 

  16. Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th International Conference on Machine Learning, Madison, pp. 82–90. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  17. Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene-expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)

    Google Scholar 

  18. Buhmann, M.D.: Radial Basis Functions. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  19. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)

    Google Scholar 

  20. Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. Technical report, UCLA Computational and Applied Mathematics (2008)

    Google Scholar 

  21. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    MathSciNet  Google Scholar 

  22. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz (2004)

  23. Chapelle, O., Haffner, P., Vapnik, V.N.: SVMs for histogram-based image classification. IEEE Trans. Neural Netw. 10(5), 1055–1064 (1999)

    Google Scholar 

  24. Chen, P.-H., Fan, R.-E., Lin, C.-J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17, 893–908 (2006)

    Google Scholar 

  25. Collobert, R., Bengio, S.: Support vector machines for large scale regression problems. J. Mach. Learn. Res. 1, 143–160 (2001)

    MathSciNet  Google Scholar 

  26. Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  27. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  28. Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)

    MATH  MathSciNet  Google Scholar 

  29. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Point of View. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  30. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996)

    MATH  Google Scholar 

  31. Devroye, L.P.: Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Mach. Intell. 4, 154–157 (1982)

    MATH  Google Scholar 

  32. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)

    MATH  Google Scholar 

  33. Dinuzzo, F., Neve, M., Nicolao, G.D., Gianazza, U.P.: On the representer theorem and equivalent degrees of freedom of SVR. J. Mach. Learn. Res. 8, 2467–2495 (2007)

    MATH  MathSciNet  Google Scholar 

  34. Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  35. Edmunds, D.E., Triebel, H.: Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  36. Elisseeff, A., Evgeniou, A., Pontil, M.: Stability of randomised learning algorithms. J. Mach. Learn. Res. 6, 55–79 (2005)

    MATH  MathSciNet  Google Scholar 

  37. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)

    MATH  MathSciNet  Google Scholar 

  38. Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 6, 1889–1918 (2005)

    MATH  MathSciNet  Google Scholar 

  39. Fasshauer, G.E.: Meshfree Approximation Methods with MATLAB. World Scientific, Hackensack (2007)

    MATH  Google Scholar 

  40. Fazel, M., Hindi, H., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American Control Conference, pp. 4734–4739 (2001)

    Google Scholar 

  41. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)

    Google Scholar 

  42. Flake, G.W., Lawrence, S.: Efficient SVM regression training with SMO. Technical report, NEC Research Institute (1999)

    Google Scholar 

  43. Gauss, C.F.: Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections (C. H. Davis, Trans.) Dover, New York (1963). First published 1809

    Google Scholar 

  44. Girosi, F.: An equivalence between sparse approximation and support vector machines. Neural Comput. 10(6), 1455–1480 (1998)

    Google Scholar 

  45. Golub, G.H., Loan, C.F.V.: Matrix Computation, 3rd edn. John Hopkins University Press, Baltimore (1996)

    Google Scholar 

  46. Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002)

    MATH  Google Scholar 

  47. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)

    MATH  Google Scholar 

  48. Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT, Cambridge (2001)

    Google Scholar 

  49. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    MATH  MathSciNet  Google Scholar 

  50. Huang, T., Kecman, V., Kopriva, I., Friedman, J.: Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised and Unsupervised Learning. Springer, Berlin (2006)

    Google Scholar 

  51. Jaakkola, T.S., Haussler, D.: Probabilistic kerbnel regression models. In: Proceedings of the 1999 Conference on Artificial Intelligence and Statistics, Fort Lauderdale (1999)

    Google Scholar 

  52. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, pp. 41–56. MIT, Cambridge (1999)

    Google Scholar 

  53. Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic, Boston (2002)

    Google Scholar 

  54. Kailath, T.: RKHS approach to detection and estimation problems: part i: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)

    MATH  MathSciNet  Google Scholar 

  55. Keerthi, S.S., Shevade, S.K., Battacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SMV classifier design. Neural Comput. 13, 637–649 (2001)

    MATH  Google Scholar 

  56. Kimeldorf, G.S., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33, 82–95 (1971)

    MATH  MathSciNet  Google Scholar 

  57. Kolmogorov, A.N., Tikhomirov, V.M.: ɛ-entropy and ɛ-capacity of sets in functional spaces. Am. Math. Soc. Transl. 17, 277–364 (1961)

    Google Scholar 

  58. Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Kauffman, M. (ed.) Proceedings of the International Conference on Machine Learning (2002)

    Google Scholar 

  59. Krige, D.G.: A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. 52(6), 119–139 (1951)

    Google Scholar 

  60. Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings Berkley Symposium on Mathematical Statistics and Probability, pp. 482–492. University of California Press (1951)

    Google Scholar 

  61. Laplace, P.S.: Théorie Analytique des Probabilités, 3rd edn. Courier, Paris (1816)

    Google Scholar 

  62. LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Müller, U., Säckinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: Fogelman-Souleé, F., Gallinari, P. (eds.) Proceedings ICANN’95, Paris, vol. 2, pp. 53–60 (1995)

    Google Scholar 

  63. Legendre, A.M.: Nouvelles Méthodes pour la Determination des Orbites des Cométes. Courier, Paris (1805)

    Google Scholar 

  64. Leopold, E., Kinderman, J.: Text categogization with support vector machines how to represent text in input space? Mach. Learn. 46(1–3), 223–244 (2002)

    Google Scholar 

  65. Lin, C.J.: On the convergence of the decomposition method for support vector machines. IEEE Trans. Neural Netw. 12, 1288–1298 (2001)

    Google Scholar 

  66. Lu, Z., Monteiro, R.D.C., Yuan, M.: Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Math. Program. 131(1–2), 163–194 (2012)

    MATH  MathSciNet  Google Scholar 

  67. Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rank minimization. Technical report 08-78, UCLA Computational and Applied Mathematics (2008)

    Google Scholar 

  68. Mangasarian, O.L.: Nonlinear Programming. SIAM, Madison (1994)

    MATH  Google Scholar 

  69. Mangasarian, O.L., Musicant, D.R.: Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)

    Google Scholar 

  70. Matheron, G.: Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963)

    Google Scholar 

  71. Micchelli, C.A.: Interpolation of scattered data: distance matices and conditionally positive definite functions. Constr. Approx. 2, 11–22 (1986)

    MATH  MathSciNet  Google Scholar 

  72. Micchelli, C.A., Pontil, M.: On learning vector-valued functions. Neural Comput. 17, 177–204 (2005)

    MATH  MathSciNet  Google Scholar 

  73. Mitchell, T.M.: Machine Learning. McGraw-Hill, Boston (1997)

    MATH  Google Scholar 

  74. Mukherjee, S., Niyogi, P., Poggio, T., Rifkin, R.: Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25, 161–193 (2006)

    MATH  MathSciNet  Google Scholar 

  75. Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recognit. 38, 1815–1830 (2005)

    MATH  Google Scholar 

  76. Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput. 20, 231–252 (2010)

    MathSciNet  Google Scholar 

  77. Osuna, E., Freund, R., Girosi, F.: Training of support vector machines: an application to face detection. In: Proceedings of the CVPR’97, San Juan, pp. 130–136. IEEE Computer Society, Washington, DC (1997)

    Google Scholar 

  78. Parzen, E.: Statistical inference on time series by RKHS methods. Technical report, Department of Statistics, Stanford University (1970)

    Google Scholar 

  79. Pinkus, A.: N-width in Approximation Theory. Springer, Berlin/Heidelberg/New York-Tokyo (1996)

    Google Scholar 

  80. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)

    Google Scholar 

  81. Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)

    Google Scholar 

  82. Pong, T.K., Tseng, P., Ji, S., Ye, J.: Trace norm regularization: reformulations, algorithms and multi-task learning. SIAM J. Optim. 20, 3465–3489 (2010)

    MATH  MathSciNet  Google Scholar 

  83. Povzner, A.Y.: A class of Hilbert function spaces. Dokl. Akad. Nauk USSR 68, 817–820 (1950)

    MathSciNet  Google Scholar 

  84. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1959)

    Google Scholar 

  85. Schoenberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39, 811–841 (1938)

    MathSciNet  Google Scholar 

  86. Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) Proceedings of the 14th Annual Conference on Computational Learning Theory, Amsterdam, pp. 416–426. Springer, New York (2001)

    Google Scholar 

  87. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2002)

    Google Scholar 

  88. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis, 4th edn. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  89. Smola, A.J., Schölkopf, B., Müller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)

    Google Scholar 

  90. Spellucci, P.: Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, Basel/Boston/Berlin (1993)

    MATH  Google Scholar 

  91. Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: NIPS, pp. 1329–1336 (2005)

    Google Scholar 

  92. Steinwart, I.: Sparseness of support vector machines. J. Mach. Learn. Res. 4, 1071–1105 (2003)

    MathSciNet  Google Scholar 

  93. Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)

    MATH  Google Scholar 

  94. Stone, C.: Consistent nonparametric regression. Ann. Stat. 5, 595–645 (1977)

    MATH  Google Scholar 

  95. Strauss, D.J., Steidl, G.: Hybrid wavelet-support vector classification of waveforms. J. Comput. Appl. Math. 148, 375–400 (2002)

    MATH  MathSciNet  Google Scholar 

  96. Strauss, D.J., Steidl, G., Delb, D.: Feature extraction by shape-adapted local discriminant bases. Signal Process. 83, 359–376 (2003)

    MATH  Google Scholar 

  97. Sutton, R.S., Barton, A.G.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)

    Google Scholar 

  98. Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)

    MATH  Google Scholar 

  99. Suykens, J.A.K., Vandevalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Google Scholar 

  100. Tao, P.D., An, L.T.H.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)

    Google Scholar 

  101. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  102. Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-Posed Problems. Winston, Washington, DC (1977)

    Google Scholar 

  103. Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Technical report, Department of Mathematics, National University of Singapore, Singapore (2009)

    Google Scholar 

  104. Tsypkin, Y.: Adaptation and Learning in Automatic Systems. Academic, New York (1971)

    Google Scholar 

  105. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  106. Vapnik, V.N.: Estimation of Dependicies Based on Empirical Data. Springer, New York (1982)

    Google Scholar 

  107. Vapnik, V.N., Chervonenkis, A.: Theory of Pattern Recognition (in Russian). Nauka, Moscow (1974) (German translation: Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979 edition)

    Google Scholar 

  108. Vapnik, V.N., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)

    Google Scholar 

  109. Vidyasagar, M.: A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems, 2nd edn. Springer, London (2002)

    Google Scholar 

  110. Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

    Google Scholar 

  111. Vito, E.D., Rosasco, L., Caponnetto, A., Piana, M., Verri, A.: Some properties of regularized kernel methods. J. Mach. Learn. Res. 5, 1363–1390 (2004)

    MATH  Google Scholar 

  112. Wahba, G.: Spline Models for Observational Data. SIAM, New York (1990)

    MATH  Google Scholar 

  113. Weimer, M., Karatzoglou, A., Smola, A.: Improving maximum margin matrix factorization. Mach. Learn. 72(3), 263–276 (2008)

    Google Scholar 

  114. Wendland, H.: Scattered Data Approximation. Cambridge University Press, Cambridge (2005)

    MATH  Google Scholar 

  115. Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)

    MATH  MathSciNet  Google Scholar 

  116. Weston, J., Watkins, C.: Multi-class support vector machines. In: Verlysen, M. (ed.) Proceedings ESANN’99, Brussels. D-Facto Publications (1999)

    Google Scholar 

  117. Wolfe, P.: Duality theorem for nonlinear programming. Q. Appl. Math. 19, 239–244 (1961)

    MATH  MathSciNet  Google Scholar 

  118. Zdenek, D.: Optimal Quadratic Programming Algorithms with Applications to Variational Inequalities. Springer, New York (2009)

    MATH  Google Scholar 

  119. Zhang, T.: Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–134 (2004)

    MATH  Google Scholar 

  120. Zoutendijk, G.: Methods of Feasible Directions. A Study in Linear and Nonlinear Programming. Elsevier, Amsterdam (1960)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Steidl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this entry

Cite this entry

Steidl, G. (2015). Supervised Learning by Support Vector Machines. In: Scherzer, O. (eds) Handbook of Mathematical Methods in Imaging. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0790-8_22

Download citation

Publish with us

Policies and ethics