Summary
The minimum number of misclassifications achievable with affine hyperplanes on a given set of labeled points is a key quantity in both statistics and computational learning theory. However, determining this quantity exactly is NP-hard, c.f. Höffgen, Simon and van Horn (1995). Hence, there is a need to find reasonable approximation procedures. This paper introduces two new approaches to approximating the minimum number of misclassifications achievable with affine hyperplanes. Both approaches are modifications of the regression depth method proposed by Rousseeuw and Hubert (1999) for linear regression models. Our algorithms are compared to the existing regression depth algorithm (c.f. Christmann and Rousseeuw, 1999) for various data sets. We also used a support vector machine approach, as proposed by Vapnik (1998), as a reference method.
Similar content being viewed by others
References
Albert, A. and Anderson, J.A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1–10.
Boser, B, Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. Proceedings of the 5th Annual Workshop on Computational Learning Theory, 144–152.
Burges, C.J.C. (1998). A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining. 2, 1–43.
Christmann, A. and Roussseeuw, P.J. (1999). Measuring overlap in logistic regression. Technical Report, University of Dortmund, SFB 475. To appear in: Computational Statistics and Data Analysis. https://doi.org/www.statistik.uni-dortmund.de/sfb475/berichte/tr25-99-software.zip
Efron, B. (1986). Double exponential families and their use in generalized linear regression. J. Amer. Statist. Assoc., 81, 709–721.
Finney, D.J. (1947). The estimation from individual records of the relationship between dose and quantal response. Biometrika 34, 320–334.
Hermans, J. and Habbema, J.D.F. (1975). Comparison of five methods to estimate posterior probabilities. EDV in Medizin und Biologie 6, 14–19.
Höffgen, K.U., Simon, H.-U., van Horn, K.S.} (1995). Robust Trainability of Single Neurons. J. Computer and System Sciences, 50, 114–125.
Hosmer, D.W. and Lemeshow, S. (1989). Applied Logistic Regression. Wiley, New York.
Jaeger, H.J., Mair, T., Geller, M., Kinne, R.K., Christmann, A., Mathias, K.D. (1997). A physiologie in vitro model of the inferior vena cava with a computer-controlled flow system for testing of inferior vena cava filters. Investigative Radiology 32, 511–522.
Jaeger, H.J., Kolb, S., Mair, T., Geller, M., Christmann, A., Kinne, R.K., Mathias, K.D. (1998). In vitro model for the evaluation of inferior vena cava filters: effect of experimental parameters on thrombus-capturing efficacy of the Vena Tech-LGM Filter. Journal of Vascular and Interventional Radiology 9, 295–304.
Joachims, T. (1999). Making large-Scale SVM Learning Practical. In: B. Schölkopf, C. Burges, A. Smola (ed.), Advances in Kernel Methods — Support Vector Learning, MIT-Press. https://doi.org/www-ai.cs.uni-dortmund.de/svm_light
Künsch, H.R., Stefanski, L.A. and Carroll, R.J. (1989). Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Amer. Statist. Assoc. 84, 460–466.
Lee, E.T. (1974). A computer program for linear logistic regression analysis. Computer Programs in Biomedicine, 80–92.
Novikoff, A. (1962). On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata, Vol XII, pp. 615–622.
Osuna, E., Freund, R., and Griosi, F. (1997). An improved algorithm for training support vector machines. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, 276–285.
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In: B. Schölkopf, C. Burges, A. Smola (ed.), Advances in Kernel Methods — Support Vector Learning, MIT-Press.
Pires, A.M. (1995). Análise Discriminante: Novos Métodos Robustos de Estimacão. Ph.D. thesis, Technical University of Lisbon, Portugal.
Pregibon, D. (1981). Logistic regression diagnostics. Ann. Statist. 9, 705–724.
Riedwyl, H. (1997). Lineare Regression und Verwandtes. Birkhäuser, Basel.
Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan. New York.
Rousseeuw, P.J. and Hubert, M. (1999). Regression Depth. J. Amer. Statist. Assoc., 94, 388–433.
Rousseeuw, P.J. and Struyf, A. (1998). Computing location depth and regression depth in higher dimensions. Statistics and Computing 8, 193–203.
Santner, T.J. and Duffy, D.E. (1986). A note on A. Albert and J.A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika 73, 755–758.
Smola, A.J. (1998). Learning with Kernels. Ph.D. thesis, TU Berlin, GMD Research Series No. 25. https://doi.org/svm.first.gmd.de/software/logosurvey.html
Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
Acknowledgements
The authors thank Prof. P.J. Rousseeuw for helpful discussions, Prof. R.J. Carroll for making available the Food Stamp data set, and Dr. H.J. Jaeger for making available the IVC data set.
Author information
Authors and Affiliations
Additional information
The financial support of the Deutsche Forschungsgemeinschaft (SFB 475, “Reduction of complexity in multivariate data structures”) is gratefully acknowledged.
Appendix
Appendix
In the following we give pseudo-code for the heuristic method.
Rights and permissions
About this article
Cite this article
Christmann, A., Fischer, P. & Joachims, T. Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Computational Statistics 17, 273–287 (2002). https://doi.org/10.1007/s001800200106
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800200106