Abstract
The Bayes rule is the optimal classification rule if the underlying distribution of the data is known. In practice we do not know the underlying distribution, and need to “learn” classification rules from the data. One way to derive classification rules in practice is to implement the Bayes rule approximately by estimating an appropriate classification function. Traditional statistical methods use estimated log odds ratio as the classification function. Support vector machines (SVMs) are one type of large margin classifier, and the relationship between SVMs and the Bayes rule was not clear. In this paper, it is shown that the asymptotic target of SVMs are some interesting classification functions that are directly related to the Bayes rule. The rate of convergence of the solutions of SVMs to their corresponding target functions is explicitly established in the case of SVMs with quadratic or higher order loss functions and spline kernels. Simulations are given to illustrate the relation between SVMs and the Bayes rule in other cases. This helps understand the success of SVMs in many classification studies, and makes it easier to compare SVMs and traditional statistical methods.
Similar content being viewed by others
References
Boser, B.E., Guyon, I.M., and Vapnik, V.N. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, D. Haussler (Ed.). Pittsburgh, PA: ACM Press.
Burges, C.J.C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167.
Cortes, C. and Vapnik, V.N. 1995. Support vector networks. Machine Learning, 20:273–297.
Cox, D.D. and O'sullivan, F. 1990. Asymptotic analysis of penalized likelihood and related estimates. The Annals of Statistics, 18(4):1676–1695.
Evgeniou, T., Pontil, M., and Poggio, T. 1999. A unified framework for regularization networks and support vector machines. Technical Report, M.I.T. Artificial Intelligence Laboratory and Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences.
Friedman, J.H. 1997. On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1(1):55–77.
Kaufman, L. 1999. Solving the quadratic programming problem arising in support vector classification. In Advances in Kernel Methods-Support Vector Learning, B. Schölkopf, C.J.C. Burges, and A.J. Smola (Eds.). Cambridge, MA: MIT Press, pp. 147–168.
Lin, Y. 1998. Tensor product space ANOVA models in high dimensional function estimation. Ph.D. Dissertation, University of Pennsylvania.
Lin, Y. 2000a. Tensor product space ANOVA models. The Annals of Statistics, 28(3):734–755.
Lin, Y. 2000b. On the support vector machine. Technical Report 1029, Department of Statistics, University of Wisconsin, Madison.
Lin, Y., Lee, Y., and Wahba, G. 2002. Support vector machines for classification in nonstandard situations. Machine Learning, 46:191–202.
Shawe-Taylor, J. and Cristianini, N. 1998. Robust bounds on the generalization from the margin distribution. Neuro COLT Technical Report TR-1998-029.
Vapnik, V.N. 1995. The Nature of Statistical Learning Theory. New York: Springer Verlag.
Wahba, G. 1990. Spline Models for Observational Data. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Wahba, G. 1999. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In Advances in Kernel Methods-Support Vector Learning, B. Scholkopf, C.J.C. Burges, and A.J. Smola (Eds.). Cambridge, MA: MIT Press.
Wahba, G., Lin, Y., and Zhang, H. 2000. GACV for support vector machines, or, another way to look at margin-like quantities. In Advances in Large Margin Classifiers, A.J. Smola, P. Bartlett, B. Scholkopf, and D. Schurmans (Eds.). Cambridge, MA and London, England: MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lin, Y. Support Vector Machines and the Bayes Rule in Classification. Data Mining and Knowledge Discovery 6, 259–275 (2002). https://doi.org/10.1023/A:1015469627679
Issue Date:
DOI: https://doi.org/10.1023/A:1015469627679