Randomized hypotheses and minimum disagreement hypotheses for learning with noise

Cesa-Bianchi, Nicolò; Fischer, Paul; Shamir, Eli; Simon, Hans Ulrich

doi:10.1007/3-540-62685-9_11

Nicolò Cesa-Bianchi¹,
Paul Fischer²,
Eli Shamir³ &
…
Hans Ulrich Simon²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1208))

Included in the following conference series:

European Conference on Computational Learning Theory

122 Accesses
2 Citations

Abstract

In this paper we prove various results about PAC learning in the presence of malicious and random classification noise. Our main theme is the use of randomized hypotheses for learning with small sample sizes and high malicious noise rates. We show an algorithm that PAC learns any target class of VC-dimension d using randomized hypotheses and order of d/ε training examples (up to logarithmic factors) while tolerating malicious noise rates even slightly larger than the information-theoretic bound ε/(1+ε) for deterministic hypotheses. Combined with previous results, this implies that a lower bound d/Δ+ε/Δ ² on the sample size, where η=ε/(l+ε)−Δ is the malicious noise rate, applies only when using deterministic hypotheses. We then show that the information-theoretic upper bound on the noise rate for deterministic hypotheses can be replaced by 2ε/(l+2ε) if randomized hypotheses are used. Investigating further the use of randomized hypotheses, we show a strategy for learning the powerset of d elements using an optimal sample size of order dε/Δ ² (up to logarithmic factors) and tolerating a noise rate η=2ε/(l+2ε)−Δ. We complement this result by proving that this sample size is also necessary for any class C of VC-dimension d.

We then discuss the performance of the minimum disagreement strategy under both malicious and random classification noise models. For malicious noise we show an algorithm that, using deterministic hypotheses, learns unions of d intervals on the continuous domain [0, 1) using a sample size significantly smaller than that needed by the minimum disagreement strategy. For classification noise we show, generalizing a result by Laird, that order of d/εΔ ²) training examples suffice (up to logarithmic factors) to learn by minimizing disagreements any target class of VC-dimension d tolerating random classification noise rate η=1/2−Δ. Using a lower bound by Simon, we also prove that this sample size bound cannot be significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rényi divergence on learning with errors

Article 10 August 2020

A general framework for the practical disintegration of PAC-Bayesian bounds

Article 11 October 2023

Learning Privately with Labeled and Unlabeled Examples

Article 03 August 2020

References

D. Angluin and P.D. Laird. Learning from noisy examples. Machine Learning, 2:343–370, 1988.
Google Scholar
N. Cesa-Bianchi, E. Dichterman, P. Fischer, and H. Simon. Noise-tolerant learning near the information-theoretic bound. In Proceedings of the 28th ACM Symposium on the Theory of Computing. ACM Press, 141–150, 1996.
Google Scholar
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A General Lower Bound on the Number of Examples Needed for Learning. Information and Computation, 82(3):247–261, 1989.
Google Scholar
M. Kearns and R.E. Schapire. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and Systems Sciences, 48(3):464–497, 1994. An extended abstract appeared in the Proceedings of the 30th Annual Symposium on the Foundations of Computer Science.
Google Scholar
M. J. Kearns, R. E. Schapire, and L. Sellie. Toward Efficient Agnostic Learning. Machine Learning, 17(2):115–141, 1994.
Google Scholar
M.J. Kearns and M. Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807–837, 1993. A preliminary version appeared in the Proceedings of the 20th ACM Symposium on the Theory of Computation.
Google Scholar
P.D. Laird. Learning from Good and Bad Data. Kluwer, 1988.
Google Scholar
H.U. Simon. General bounds on the number of examples needed for learning probabilistic concepts. Journal of Computer and Systems Sciences, 52:239–254, 1996.
Google Scholar
L. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
Google Scholar
V.N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag, 1982.
Google Scholar

Download references

Author information

Authors and Affiliations

DSI, Universitá di Milano, Via Comelico 39, I-20135, Milano, Italy
Nicolò Cesa-Bianchi
Lehrstuhl Informatik II, Universität Dortmund, D-44221, Dortmund, Germany
Paul Fischer & Hans Ulrich Simon
Hebrew University, Jerusalem, Israel
Eli Shamir

Authors

Nicolò Cesa-Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Paul Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Eli Shamir
View author publications
You can also search for this author in PubMed Google Scholar
Hans Ulrich Simon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Shai Ben-David

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cesa-Bianchi, N., Fischer, P., Shamir, E., Simon, H.U. (1997). Randomized hypotheses and minimum disagreement hypotheses for learning with noise. In: Ben-David, S. (eds) Computational Learning Theory. EuroCOLT 1997. Lecture Notes in Computer Science, vol 1208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62685-9_11

Download citation

DOI: https://doi.org/10.1007/3-540-62685-9_11
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62685-5
Online ISBN: 978-3-540-68431-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics