The optimal PAC bound for intersection-closed concept classes☆
Introduction
Since Valiant introduced the PAC-learning framework in [1], it is an open problem to give sharps bounds on the number of labeled examples necessary to successfully learn in the realizable setting (readers unfamiliar with this framework may want to read Section 2 first, which gives a succinct definition of PAC-learning). The best known worst case lower bound, which was proven by Ehrenfeucht, Haussler, Kearns and Valiant in [2], is where d is the VC-dimension of the concept class, ε is the accuracy and δ is the confidence parameter. On the other hand, the best matching upper bound, which was proven by Blumer, Ehrenfeucht, Haussler and Warmuth [3], is Obviously, these bounds leave a gap of .
Warmuth conjectured in [4] that the factor of in the upper bound can be overcome. He furthermore conjectured that this can be achieved by the one-inclusion graph algorithm, which—in the case of intersection-closed concept classes—collapses to the closure algorithm, whose output is the smallest consistent hypothesis. For some special intersection-closed classes, which possess an additional combinatorial property, Auer and Ortner provided a proof of the conjecture in [5]. We will show that Warmuth's conjecture is indeed true for all intersection-closed classes.
We will prove our main result by employing a technique which involves the so-called disagreement coefficient. The disagreement coefficient was first used in learning theory, without calling it by this name, by Giné and Koltchinskii in a paper on agnostic learning in 2006 [6], which was based on the work of Alexander [7], but it was independently discovered by Hanneke in 2007 [8] to analyze the label complexity of agnostic active learning. While most applications of the disagreement coefficient are in the field of agnostic active learning, we are aware of two exceptions where it was used in the standard, i.e., realizable and passive setting: one section of Hanneke's Ph.D. thesis [9, page 50ff] and an article by Darnstädt, Simon and Szörenyi [10]. We will combine the ideas from these two works and improve on Hanneke's results in Section 4.
Section snippets
Notation and basic definitions
We introduce some notation and remind the reader of the definitions of PAC-learning [1] and the VC-dimension [11]:
For any set A, let denote the power set of A. Let X be a nonempty set, called the domain, let D be a distribution over X and, for any (measurable) , let denote the probability mass of S under D. A concept class2 is a set . The
Disagreement coefficients
Hanneke's original definition of the disagreement coefficient is as follows: Definition 1 (See Hanneke [8].) For and , let be the ball of radius r around concept t, i.e. . Then Hanneke's disagreement coefficient is defined as Taking the supremum over all and all distributions D over X yields the distribution independent disagreement coefficient .
Darnstädt, Simon and Szörenyi considered the following variant,
PAC bounds
While the main application of is in the field of agnostic active learning, Hanneke also proved the following theorem for the realizable framework: Theorem 6 (See Hanneke [9].) Let be a concept class of VC-dimension d, D a distribution over X, a random sample of size m and . Then for any it holds with probability for all Thus the sample complexity is upper bounded by
Combining Hanneke's result with the
Conclusions and final remarks
We improved on Hanneke's PAC bound for consistent learners and could show that the closure algorithm learns intersection-closed classes with a sample size that matches the general lower bound. To our knowledge this is the first proof of sharp sample complexity bounds for a natural family of classes in the realizable PAC-learning framework.
We demonstrated that the disagreement coefficient can be used to prove non-trivial, long-sought theorems in realizable PAC-learning. We believe that the
References (11)
- et al.
Supervised learning and co-training
Theor. Comput. Sci.
(2014) A theory of the learnable
- et al.
A general lower bound on the number of examples needed for learning
- et al.
Learnability and the Vapnik–Chervonenkis dimension
J. ACM
(1989) The optimal PAC algorithm
Cited by (0)
- ☆
This work was supported by the bilateral Research Support Programme between Germany (DAAD 50751924) and Hungary (MÖB 14440).
- 1
Tel.: +49 234 32 23209.