Abstract
Following recent results [6] showing the importance of the fat shattering dimension in explaining the beneficial effect of a large margin on generalization performance, the current paper investigates how the margin on a test example can be used to give greater certainty of correct classification in the distribution independent model. The results show that even if the classifier does not classify all of the training examples correctly, the fact that a new example has a larger margin than that on the misclassified examples, can be used to give very good estimates for the generalization performance in terms of the fat shattering dimension measured at a scale proportional to the excess margin. The estimate relies on a sufficiently large number of the correctly classified training examples having a margin roughly equal to that used to estimate generalization, indicating that the corresponding output values need to be ‘well sampled’. If this is not the case it may be better to use the estimate obtained from a smaller margin.
This work was supported by the ESPRIT Neurocolt Working Group No. 8556.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Noga Alon, Shai Ben-David, Nicolò Cesa-Bianchi, David Haussler, “Scale-sensitive Dimensions, Uniform Convergence, and Learnability,” in Proceedings of the Conference on Foundations of Computer Science (FOCS), 1993. Also to appear in Journal of the ACM.
Martin Anthony and John Shawe-Taylor, “A Result of Vapnik with Applications,” Discrete Applied Mathematics, 47, 207–217, (1993).
Peter Bartlett, “The Sample Complexity of Pattern Classification with Neural Networks: the Size of the Weights is More Important than the Size of the Network,” Technical Report, Department of Systems Engineering, Australian National University, May 1996.
Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” pages 144–152 in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh ACM, (1992)
D.J.C. MacKay, Bayesian Methods for Adaptive Models, Ph.D. Thesis, Caltech, 1991.
John Shawe-Taylor, Peter Bartlett, Robert Williamson and Martin Anthony, Structural Risk Minimization over Data-Dependent Hierarchies, NeuroCOLT Technical Report, NC-TR-96-51.
Vladimir N. Vapnik, Estimation of Dependences Based on Empirical Data, Springer-Verlag, New York, 1982.
Vladimir N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shawe-Taylor, J. (1997). Confidence estimates of classification accuracy on new examples. In: Ben-David, S. (eds) Computational Learning Theory. EuroCOLT 1997. Lecture Notes in Computer Science, vol 1208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62685-9_22
Download citation
DOI: https://doi.org/10.1007/3-540-62685-9_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62685-5
Online ISBN: 978-3-540-68431-2
eBook Packages: Springer Book Archive